How I stopped missing Darcs and started loving Git

Posted by Tom Moertel Mon, 10 Dec 2007 21:52:00 GMT

About three years ago, I switched to Darcs as my primary source-code management system. It was simple, intuitive, and powerful, and it made managing my projects more fun and less frustrating than any centralized VCS ever had. That it was written in Haskell, one of my favorite programming languages, made it even better. I was hooked.

Since then, the distributed SCM landscape has changed. Darcs hasn’t improved much, but its competitors have made long strides, especially Git and Mercurial. Both are crazy fast, vigorously developed, and widely used on large, highly active real-world projects, such as the Linux kernel and Mozilla 2. In comparison, Darcs has stagnated.

When I started working for a new company recently, I had to consider whether to advocate Darcs or something else. In the end, I decided that Darcs would be a hard sell. Nobody else at the company uses Haskell, and having to explain how to avoid the occasional corner case seemed liked a losing proposition.

After researching and playing around with Git and Mercurial, I settled on Git. I like Git’s underlying hashed-blobs model better than Mercurial’s revlogs, and Git seems to have slightly more development momentum. Still, it was a close call. Either choice would have been completely reasonable.

Missing Darcs

When I started using Git on real projects, the one thing I really missed was the ability to easily amend earlier patches, something Darcs made trivial. Let me explain. The typical development workflow goes something like this:

  1. Checkout copy of upstream code base.
  2. Implement feature X.
  3. Commit.
  4. Implement independent feature Y.
  5. Commit.
  6. Implement independent feature Z.
  7. Commit.
  8. Push new features back upstream.

Now, what really happens is that when I’m implementing Y or Z, I’ll realize that I made a mistake in X. The trick is then fixing X so that my fix is part of the changeset/patch for X that ultimately gets pushed upstream in the last step. That way, the upstream folks will see only a single, clean patch for feature X – not a mishmash of patches that together represent X.

In Darcs, amending the original patch is easy because its patch theory lets me tweak the patch for X independently of the other patches. Darcs will simply ask me which patch I want to amend, and I’ll select the orignal patch for X:

$ emacs               # fix X
$ darcs amend-record  # amend original patch for X

Mon Dec 10 14:43:13 EST 2007  Tom Moertel <tom@moertel.com>
  * Implemented Z
Shall I amend this patch? [yNvpq], or ? for help: n

Mon Dec 10 14:42:12 EST 2007  Tom Moertel <tom@moertel.com>
  * Implemented Y
Shall I amend this patch? [yNvpq], or ? for help: n

Mon Dec 10 14:41:46 EST 2007  Tom Moertel <tom@moertel.com>
  * Implemented X
Shall I amend this patch? [yNvpq], or ? for help: y
hunk ./x 1
-X1
+X2
Shall I add this change? (1/?)  [ynWsfqadjkc], or ? for help: y
Finished amending patch:
Mon Dec 10 14:43:25 EST 2007  Tom Moertel <tom@moertel.com>
  * Implemented X

That’s it. The exact same process will work regardless of when I realize I need to fix X: before I start Y, while I’m implementing Y, after I’ve committed Y, while I’m working on Z, or after I’ve committed Z.

Learning to love Git

With Git, however, I can amend a commit only if I haven’t committed anything else before making my fix. In Git’s mind, Y depends on X, and Z depends on Y, even if they really are independent of one another.

So if I commit the original patch for X and then immediately realize I need to make a fix, before I start working on Y or Z, it’s easy:

$ emacs               # implement X
$ git commit -m 'Implemented X'

# discover problem in X

$ emacs               # fix X
$ git commit --amend  # amend original patch

More typically, it’s only while I’m working on Y that I’ll realize I need to fix X. Then it’s more complicated to amend the original commit:

$ emacs               # implement X
$ git commit -m 'Implemented X'
$ emacs               # start working on Y

# discover problem in X

$ git stash           # stash away half-completed work on Y
$ emacs               # fix X
$ git commit --amend  # amend original patch for X
$ git stash apply     # restore work on Y
$ emacs               # continue working on Y

While not as convenient as Darcs’s workflow, it’s perfectly workable.

Now let’s consider another fairly typical case: I commit X and Y and then start working on Z before I notice the problem in X. I used to think that Git couldn’t handle this case, but it can, thanks to git rebase --interactive:
$ emacs               # implement X
$ git commit -m 'Implemented X'
$ emacs               # implement Y
$ git commit -m 'Implemented Y'
$ emacs               # start working on Z

# discover problem in X

$ git stash           # stash away half-completed work on Z
$ emacs               # fix X
$ git commit -m 'Fixed X'
$ git rebase --interactive HEAD~3  # see comments below
$ git stash apply     # restore work on Z
$ emacs               # continue working on Z
The git rebase --interactive command is powerful. What the command does, as called in the snippet above, is invoke my editor of choice on a text file describing the last 3 commits (that’s the HEAD~3 part):
# Rebasing 3ad99a7..b9a8405 onto 3ad99a7
#
# Commands:
#  pick = use commit
#  edit = use commit, but stop for amending
#  squash = use commit, but meld into previous commit
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
pick 0885540 Implemented X
pick 320b115 Implemented Y
pick b9a8405 Fixed X

I can then edit the file to reorder, merge (squash), and/or remove the commits. In this example, I want to merge the fix for X into the original commit that implemented X. So I edit the file like so:

pick 0885540 Implemented X
squash b9a8405 Fixed X
pick 320b115 Implemented Y

Then I save the file, at which point Git takes over and makes the requested changes, merging the fix for X into the original commit for X. Now the log shows the original implementation and fix as one commit:

$ git log
commit f387d650976246c0854d028b040cca40e542be56
Author: Tom Moertel <tom@moertel.com>
Date:   Mon Dec 10 15:11:26 2007 -0500

    Implemented Y

commit 82a1c849ffd1bd688d5bc9d99be0e63548a89c4c
Author: Tom Moertel <tom@moertel.com>
Date:   Mon Dec 10 15:13:03 2007 -0500

    Implemented X

    Fixed X

commit 3ad99a7ef537b7ae99e435e0d2b4b0d03de92c65
Author: Tom Moertel <tom@moertel.com>
Date:   Mon Dec 10 15:11:14 2007 -0500

    Initial checkin

Once I figured out how to use git rebase --interactive, I stopped missing Darcs and started loving Git.

Posted in
Tags , , , ,
19 comments
no trackbacks
Reddit Delicious

Comments

  1. Håkon said about 1 hour later:

    Hilarious timing, as Darcs 2-pre1 was announced/released a few hours ago, fixing the dreaded conflict bug, etc.

  2. Eric said about 2 hours later:

    Håkon,

    Unfortunately simply fixing the conflict bug does not address the remaining usability concerns nor does it make darcs less likely to trash your data either intentionally or through one of its many bugs.

  3. Håkon said about 11 hours later:

    Eric,

    This is true. However, the prerelease shows that darcs development is carrying on, and I don’t think they’ve only fixed that one bug. To me, all the other distributed SCMs are still playing catch-up. And Git scares me (well, it did last time I read about it. Maybe I should give it another chance).

  4. Kurt said about 15 hours later:

    Okay, so Git includes an oddly named command that lets you edit a kind “script” of the source control hierarchy, with its own oddly named commands, to overcome a deficiency in the software regarding changeset dependencies. Thanks anyway.

    In addition to the changeset smarts, the major feature Darcs has going for it is the focus on usability. Git is starting to sound as bad as arch.

  5. she said about 15 hours later:

    What else do you want to see Kurt? The kernel folks use git, not darcs … it makes no sense to assume that git is any worse than darcs or that the kernel folks are idiots on what they do (but maybe some of them actually are idiots… sometimes flamewars make everyone involved look like an idiot there).

  6. Jonas said about 17 hours later:

    I have a very superficial understanding of git, having never used it in practice, but aren’t you supposed to branch the code between different independent functions? And then submit each feature as an independent patch. I had the impression that git made branches as cheap as commits and that’s what made it great.

  7. Kurt said about 19 hours later:

    I thought my point was pretty clear. I want to see more usable software. Unfortunately, designing for usability is hard, which means it often doesn’t get done. I have no doubt that the Git designers could come up with a better way to perform the task that Tom describes in the article. Will they? Probably not, because what they have “works”, and they don’t care about putting people off.

    Why does it matter? Since they’re the kernel developers, lots of people will use what they use by example. Lots of developers will be stuck with cumbersome software, and moreover, they’ll think it’s okay. That means they won’t put more thought into their own software, and they’ll end up writing cumbersome software. It’s a self-perpetuating cycle. “Well, I can deal with complexity, so my users can deal with it, too.”

  8. Kurt said about 19 hours later:

    From another point of view, every user who thinks that some convoluted software process is okay is a point against usability. “Well the users don’t complain or even say they like it this way, so no point in making it better.” That mindset is contagious.

  9. CV said about 19 hours later:

    Kurt, I think you should learn a bit more about how git works these days. Its usability has improved by leaps and bounds in the last few months. The git-rebase command, which you criticized for its strange name, is spectacular (quite aside from its—interactive flag), and is one of the few things which helps me stay sane while I deal with Subversion repositories every day.

    I used darcs for a project about six months ago, and liked it quite a bit; I have tremendous respect for the darcs team, and I’m looking forward to the improvements in version 2. That said, since git 1.5 came out, I haven’t missed darcs at all, either from a usability or reliability point of view.

  10. Håkon said about 20 hours later:

    @she: Git isn’t necessarily better than darcs, just because the kernel guys use it. Linus is certainly no god, or an authority on UI design (remember his misguided flaming over UI issues in the past). This, however, doesn’t mean that the Git interface can’t improve.

    As a mathematician, darcs appeals to me in a way Git never will. I also think darcs will become the better SCM when t -> inf.

  11. Johan said 1 day later:

    I agree with Jonas. Branch more!

  12. Tom Moertel said 1 day later:

    Kurt, I find git rebase --interactive to be very usable, and I say that as a 3-year Darcs user who still loves Darcs. Git’s approach is basically equivalent to darcs --amend: where Darcs asks you to select the patch to amend and the hunks to merge via a series of interactive prompts, Git asks you the same questions via a single interactive editing session. Both approaches are easy when the changes you want to make are small, such as in my example in the article above, but the Git approach is considerably more manageable when the changes are larger and more complicated.

    So far, in my short time using Git, I have found that its UI is highly optimized for non-trivial, real-world coding situations. It makes the easy things easy, and the hard things surprisingly manageable.

    Cheers,
    Tom

  13. Tom Moertel said 1 day later:

    Johan and Jonas: In Git, each repo represents an independent line of development, in effect a separate branch. This is the way most distributed SCMs work.

    In Git, however, you can also maintain multiple additional branches in your local repo. So, when working on feature X, I could create a new “topic branch” for X in my local repo. Then, after I was done working on X, I could merge the topic branch back into the master branch. Then I would probably delete the topic branch since it’s no longer needed. The end result is that X would get added into the master branch.

    If X were a big feature, I might develop it that way, in its own topic branch. But, in most cases, when X is small, I just do the work in the master branch of my local repo, committing it to that branch when I think it’s done. I won’t push it upstream until later, though. I will let the commit “rest” a while (generally while working on Y or Z) to make sure it’s really done.

    That’s how I got the X, Y, Z scenario I wrote about in the main article. It’s fairly common for me.

    Cheers,
    Tom

  14. Phil Toland said 8 days later:

    I have used both Git and Darcs extensively and I think both have their pluses and minuses. Git’s branching is handy as is git stash. On the other hand, Darcs’ interactive workflow just spanks Git in terms of ease of use. Git expects you to remember arcane sequences of commands in order to accomplish something whereas Darcs is focused on making common workflow tasks easier and more intuitive. Just because Git 1.5 is a vast improvement on previous versions does not make it “great”. I believe that Kurt’s analysis is correct and that the team behind Git puts a low priority on usability.

    All that having been said, I am in the same position as Tom and need to decide on a DVCS to recommend for work. In spite of its interface flaws, Git still seems to be a better choice that Darcs. Hopefully Darcs 2 will rectify that.

  15. kevin said 22 days later:

    I love love love darcs, and refuse to learn the myriad switches and complexity of git (I’ve done it at least twice and forgotten them already!). But, I’m faced with a dev team with a bunch of windows developers. so I’m pushing for mercurial. easy to learn, and windows friendly too.

  16. kevin said 22 days later:

    oh, and mercurial’s revlogs take up less disk space and are often faster than git’s blobs! :)

  17. Mark Stosberg said 33 days later:

    Eric,

    I’m curious one of the examples where darcs 2 will “trash your data”. I’ve been looking through the bug tracker over the past few days and writing tests cases for various bugs for testing with darcs 2, and I think of a serious data-trashing problem that happens when you start with darcs 2 and the new darcs-2 format.

    I’ve been using darcs-1 since before the 1.0 release for a 40k+ line project with a few other developers, and never ran into a serious data corruption with it either.

    In fact, I was very pleased with how easy it was to recover from the usually minor problems I did run into, which usually were triggered by some user error, like running a command with the wrong permissions to complete it.

    Mark
  18. Nihil Est said 37 days later:

    Phil (@14), consider Kevin (@15)’s advice. Mercurial seems to strike a balance between usability and speed, being on par with Git for speed, and with a simple core command set that grows as you need it to (via extensions).

    I say this as someone who has tried all the VCSes out of curiosity. Git is improving in the UI sense, but it still has an everything-and-the-kitchen-sink feel to it. It has a core command set that’s learnable, but it’s not easy to extract that from the documentation. I expect this will improve as more people write tutorials, but Mercurial will get you there quicker, at least for now.

  19. Anonymous@anonymous.com said 64 days later:

    Life is too short for anyone to spend more than 30 minutes, in your whole career, learning how to use version control software.

    “nobody wants to become an expert in their revision management software, so it should be really easy to learn (flat learning curve) and still very powerful to do all the things you need and want to do.”

    Revision control software should be so easy to use that the person who can’t handle programing anything more than MS-Excel macros should be an expert in the VCS in less than a half hour, and should be able to remember most of the commands after not using the VCS, at all, for a year.

    Especially roll-back, revert, or what ever you want to call the operation.

    the quote above is from a nice darcs vs. bzr article at http://www.kdedevelopers.org/node/2024

Trackbacks

Use the following link to trackback from your own site:
http://blog.moertel.com/articles/trackback/655

(leave url/email »)

   Comment Markup Help Preview comment