Back in October, I wrote a post "To Git or Not To Git." At the time, one of our teams started to work with distributed revision control (drcs). The team looked at both Git and Mercurial. At the end of the day they decided to test using drcs with Mercurial. I tried to stay out of the decision as, on the face of it, both systems offer the same basic feature set. After a few months we took a look at our usage to decide if we should move forward or not.
First a little background, we chose to test the use of drcs with a brand new project to ensure we did not have to worry about existing source structures in perforce. The new project is the rewrite of our data warehouse ETL system using Hadoop. The code part of the project breaks down into data loaders, data transform workflows, and data exporters for each subject area (ads, clickstream, etc.) as well as common components shared across subject areas. The configuration part breaks down into puppet configurations and mcollective deployment agents for pushing our code. For the beginning of the project we focused on a single subject area. Besides keeping the test project limited in scope, we wanted to ensure we could fail easily. The ability for both git and mercurial to sync back to perforce meant that regardless of the outcome we had a fall back scenario that ensured we could fail without too much consequence.
We liked a number of things we found when using drcs; performance was great, developers could commit code from home or elsewhere without network access, developers could make changes safely and easily isolate their changes from others. We liked the ability to have a robust model for creating authoritative repositories and the ease at which we could merge feature type branches. In short drcs had a lot going for it.
However, there were a number of things we did not like. The ability to commit changes offline is nice, however, unless you push the changes to a central server for others to see your commit is, in a sense, incomplete. Watching our central hg server, I do not see a lot of people pushing changes up on a regular basis. Knowing that people are not pushing up makes me wonder if they are pulling on a regular basis. Granted, not pulling, or syncing, is an issue with any rcs, however drcs adds the further requirement to not only commit but push your commits up. It also gives you the ability to choose where you pull changes from. If you don't have to push and can pull from anywhere how do you maintain a sane stable continuous integration environment?
As we move our organization towards continuous delivery the fact that there are numerous branches of code and an unknown integration state seems to be problematic. I would rather have everyone checking into a common trunk with no branches to ensure we are always in an integrated state. While the drcs tools provide great merge tools, textual merges do not always result in semantically correct code. Fundamentally, I do not think that having feature branches allows us to get to the continuous delivery model we are pushing towards (flame here ;-).
Additionally, in order to have clean push, pull, branch semantics one does not want to have a single monolithic repository. The question then becomes what is the appropriate granularity for a repository? We decided that a repository granularity was dependent on a deployable unit of code. However, this lead us to have seven repositories for our small project. With repository explosion I wonder how a new developer to the team would even know where to start. There are additions to both hg and git to allow rudimentary project grouping, however, compared to our perforce implementation we would not be able to create the nice logical source tree we have now.
I do not believe any of the above issues are insurmountable, however, given the issues, we became uncertain if the cost was worth the benefits. For now, we are staying on perforce. However, given the power of the git and hg integration with p4 we have a number of developers that are sticking with the tools that work best for them. At the end of the day developer productivity is what is more important to me and it seems, as far as scm is concerned, we are in a good place.
3 comments: