Ten Years of Git: An Interview with Linus Torvalds

jordigh · on April 6, 2015

Let's not forget the other contender for replacing Bitkeeper, Mercurial:

http://lkml.iu.edu/hypermail/linux/kernel/0504.2/0670.html

We will also be celebrating Mercurial's 10th anniversary next week during the 3.4 Pycon sprint:

http://mercurial.selenic.com/wiki/3.4sprint

tjradcliffe · on April 6, 2015

I used Git for a couple of years then changed to Mercurial and have never looked back, except now and then to wonder at the size of the crowd that has gathered around Git, driven in no small part by the success of github (which is not a bad thing!)

Git is an entirely appropriate tool for managing as something as large and complex as the Linux kernel, but I'm doing much simpler stuff, and Mercurial is simpler and has a much shallower learning curve. I'd recommend it unequivocally to teams of modest size working on average applications.

The state of git documentation is much better today than it was five or six years ago, so maybe its complexity is less of a big deal, but so far I've not encountered any issues with Mercurial that make me think "If only I was using git this would be easy!"

What I found with git was I had to maintain a fairly complex mental model of the current state of affairs, and because I'm not very intelligent that took quite a bit of effort. With Mercurial the model is much simpler, so I can spend more of my very limited attention on writing code. It's quite possible that I simply never took the time to learn git properly, but with Mercurial I didn't have to.

jibsen · on April 6, 2015

I moved from the other way from Mercurial to Git a while back.

I started using Mercurial because the basic interface and operations appeared more natural to me. Also, at least at the time, the Windows support was better in Mercurial.

After working with some repos using Git, I fell in love with the staging area, and the ability to selectively stage parts of files. I am sure Mercurial has something similar, but it was built into Git.

The only thing that bugs me is that the Windows port appears to be stuck at 1.9.5 (yes, I know it's open-source so I should go fix it instead of complaining).

jordigh · on April 6, 2015

> The only thing that bugs me is that the Windows port appears to be stuck at 1.9.5

What? What gave you that impression? That's a really old version. Here's the last 3.3 Windows build:

http://bitbucket.org/tortoisehg/files/downloads/tortoisehg-3...

This was from the downloads page:

http://mercurial.selenic.com/downloads

Edit: Oh, crap, I completely misread that. You were talking about the git Windows version. Oh well. Mercurial Windows versions are staying quite up-to-date, thanks to the tireless work of Steve Borho.

ngoldbaum · on April 6, 2015

I think they're talking about the git port for windows.

virtualmic · on April 7, 2015

Building Git on Windows is fairly easy. I am running version 2.1.0. Some instructions here: http://ufasoli.blogspot.in/2014/10/building-and-installing-g...

skeoh · on April 6, 2015

Yes! Partial staging of files is amazing and I miss it sorely when I use hg. There must be a way to do it.

jordigh · on April 7, 2015

Use crecord. We're working on moving it into core:

https://bitbucket.org/edgimar/crecord/overview

If you want a "staging area" just use a temporary commit and keep ammending it with `hg crecord --amend`. There really is no difference between a commit and a staging area except the name. If you're afraid of pushing your WIP commit, use `hg crecord --secret` or `hg commit --secret` so that your commit will be in the secret phase and won't be pushed until you declare it draft with `hg phase --draft`.

Blackthorn · on April 7, 2015

That's a neat trick. I think it's doing it a disservice to say there's no difference except the name, though. While true, I think anyone who prefers hg to git (self included) shouldn't undersell the importance of interface.

pithon · on April 7, 2015

I haven't used hg but as a Git user who uses this feature several times per day, this would be a complete deal breaker!

cjubb39 · on April 7, 2015

Last summer, a project I was working on used hg. Missing partial staging is the only reason I haven't completely switched over from git.

vsync · on April 7, 2015

I use patch queues.

imakesnowflakes · on April 6, 2015

Mercurial is not only simpler, but also has got really powerful features like revsets.

http://www.selenic.com/hg/help/revsets

I have tried to switch to git multiple times. Every time, I keep coming back to Mercurial. The most difficulty I have with git is the non nonsensical naming of concepts. For eg, A branch is a pointer to a commit. Because of this, I experience a big mental block when I try to reason about something. With Mercurial this is very much easier. And if you are using a DVCS to any capacity you ll have to do this often.

Another great thing about Mercurial is how easy to get help about stuff. You can head to the IRC chat room and can have a very good chance of catching one of the developers who, in my experience, were very helpful...

stormbrew · on April 7, 2015

> The most difficulty I have with git is the non nonsensical naming of concepts. For eg, A branch is a pointer to a commit.

This is a frustrating thing about how people tend to talk negatively about git. I think what you mean is "not named in a way I'm used to," because there's nothing nonsensical about this naming at all, and it's actually an incredibly simple and lightweight way to reason about naming things in version control (imo, I suppose). And it permeates basically every level of how git organizes information in a pretty darn uniform way.

To me mercurial's two kinds of built-in branches and several plugins to do branch-like things is much more complex, but I still wouldn't call it nonsensical. That'd just be admitting that I stopped thinking about it when it was strange and unfamiliar to me.

imakesnowflakes · on April 7, 2015

>I think what you mean is "not named in a way I'm used to,"..

Of course, that is the whole point of naming something sensibility..Just, for example, imagine how hard it would be if you have to learn/work in a version of C that calls pointers as 'branches'.

>To me Mercurial's two kinds of built-in branches and several plugins to do branch-like things is much more complex, but I still wouldn't call it nonsensical.

I don't know what you mean by 'several plugins to do branch like things'. Isn't it more frustrating that people complain about a tool because they have to enable some advanced stuff by configuration. But I agree that Mercurial is actually complex than Git, because it provides more options to the user. So the complexity of Mercurial is a side effect of it being more powerful IMHO.

So the point is I say, naming a 'pointer to a commit' as branch, is nonsensical because it goes against our notion of the word 'Branch' from real life. It adds unnecessary burden for a human being trying to think in the language of the tool, without actually making the tool more powerful. Git could have been as powerful as it is now even if it had named things better, Right?

stormbrew · on April 7, 2015

> Of course, that is the whole point of naming something sensibility..Just, for example, imagine how hard it would be if you have to learn/work in a version of C that calls pointers as 'branches'.

Yes, if you rename <arbitrary thing> to <other arbitrary thing>, it is very likely to result in nonsense. This is not such a case.

A branch in git is literally a name for a branch of the DAG that is the commit tree. It's possibly the least abstract interpretation of the concept as possible. There is nothing nonsensical about it.

But the objection seems to be that you don't work with it like in svn or hg or p4 or cvs (which are also all different from each other), which does not make it nonsensical, merely unfamiliar. This is the distinction I'm driving at.

imakesnowflakes · on April 7, 2015

>But the objection seems to be that you don't work with it..

The objection is that, when naming is skewed in lower levels it makes it harder to reason about higher level concepts and creates ambiguity..

For example, let us continue with the idea of a 'branch'...

Suppose you define a branch as 'a set of commits'. Then it is easy to imagine a 'remove' operation on a 'branch' will remove all the commits in that 'branch'. There is no ambiguity.

But when you define a branch as 'a pointer to the last commit in a consecutive set of commits', a 'remove' operation on a branch is no longer clear what it is supposed to do.

Does it simply deletes the pointer, in which case the commits will remain untouched? But it would not be consistent with the abstraction of the 'branch'.

Does it remove all the child commits from the history? by which the operation would be consistent with the abstraction of a 'branch'.

Now the remove operation is ambiguous as to what it actually does.

Note that there would be no ambiguity if the user does not know that a branch is actually a pointer to a commit. Because git hides all the change-sets that are not the decedents of a branch. So a users concept of a 'branch removal' is maintained. But I think it is pretty accepted that using git requires that you know stuff like these...

So another way of looking at the problem is that, Git forces the user to work at multiple levels of abstraction simultaneously. And this creates ambiguity because the user wouldn't know which level of abstraction to use when reasoning about something, and defeats the whole point of having abstractions in the first place IMHO.

bad_user · on April 7, 2015

> Then it is easy to imagine a 'remove' operation on a 'branch' will remove all the commits in that 'branch'. There is no ambiguity.

The problem is not with Git's use of pointers, but with your own thinking.

One should not be able to "remove" commits, because any operation should be undo-able. In git removing commits implies updating a pointer. And if those commits end up not being referenced by anything else, then they'll get garbage collected. What git does is very close to how persistent data-structures work. And many people complain about it just because it's unfamiliar.

And in your example, of course there's ambiguity, how can it not be? What happens with the branches that are forked from your branch? That's the definition of ambiguity right there.

> So another way of looking at the problem is that, Git forces the user to work at multiple levels of abstraction simultaneously.

In my experience, the problem with Git is that people don't bother to read documentation for a tool that they are using every day.

imakesnowflakes · on April 7, 2015

>What happens with the branches that are forked from your branch?

Care to elaborate? Do you mean when removing a branch, what happens to forked/child branches? There is no ambiguity, a change set cannot exist without it's parent or cannot be moved to a different parent without changing its identity. (The revision hash is a function of its ancestors too). So if you remove a branch, the forks/child branches will be removed as well.

Anyway, that was just a made up example to show how naming can affect reasoning. I don't think Mercurial or Git allows you to delete branches directly....

Manishearth · on April 7, 2015

Since Git is graph based, there's no guarantee that removing the branch commits will work since there might be other branches using it.

Of course, if you delete a branch and then run git prune, its commits should disappear as long as they weren't part of another branch.

mraison · on April 7, 2015

I think we all understand why git branches are called branches, the problem is that it's a misleading name.

Git branches are pointers to commits. For example I can move the pointer to a commit to point to the previous commit instead. But the commit which was previously being pointed to is still there. However when I say that I reset a branch to a previous commit, I get the feeling that the branch was "cut", and so that the commit was lost - which is not the case.

This is why using the name "bookmark" is better in my opinion. Not because I'm used to it (I've used git much more than mercurial), but because it's a better representation of what is actually happening.

_ikke_ · on April 7, 2015

For all intents and purposes, as long as no other branch is pointing to that commit, that commit is 'lost'. It's not immediately removed from your hard drive, but it's no longer considered part of your history.

Note that the technical name for branches in git is refs or references, which more accurately describe their nature.

Manishearth · on April 7, 2015

> Of course, that is the whole point of naming something sensibility..Just, for example, imagine how hard it would be if you have to learn/work in a version of C that calls pointers as 'branches'.

Git was not named with Mercurial in mind. That's what they're saying, Git's naming only feels strange because you're coming from another VCS. I found Git's naming strange wen I came to it from SVN. When I tried Mercurial after that, I found it's naming strange. Everyone names things differently. It's not a Git issue.

sampo · on April 7, 2015

> I don't know what you mean by 'several plugins to do branch like things'.

Maybe his knowledge of Mercurial is 5+ years old. In versions before 1.8 (2010 and before), bookmarks was an extension ("module") and you had to add a line in .hgrc to take it into use.

stormbrew · on April 7, 2015

No, bookmarks were one of the two built-in versions I was referring to. I'm referring to things like this: http://mercurial.selenic.com/wiki/MqExtension

imakesnowflakes · on April 7, 2015

Just curious. What branch like things do you mean, when you say that you need Mq to do it? If you mean things like rebase, cherry pick, folding commits etc, those can be done in better ways by bundled in extensions like rebase, graft and historyedit.

Even some of the Mercurial developers does not like Mq extension much...

LocalPCGuy · on April 7, 2015

Like stormbrew said, it's all in what you are used to. We currently have both a mercurial and a number of git repos for various projects, and we are seriously considering spending the time to migrate the mercurial repo to git. The context switching from one to the other is really hard, commands don't work the way you expect and things just feel wrong.

Personally, I much prefer git, and I researched both before I used either. But I wouldn't fault someone for preferring mercurial, particularly if part of the reason was that they were very comfortable using it.

jordigh · on April 6, 2015

> Mercurial is not only simpler, but also has got really powerful features like revsets.

Revsets together with templates can be used to build custom commands:

http://jordi.inversethought.com/blog/customising-mercurial-l...

There are also filesets that work similarly for doing queries on files (hg help filesets).

saraid216 · on April 7, 2015

> For eg

FYI, "e.g." stands for "exempli gratia" which is Latin for "For the sake of example".

So "for eg" is actually "for for the sake of example".

[1] http://en.wiktionary.org/wiki/e.g.

imakesnowflakes · on April 7, 2015

Oh, I see. Thank you! :)

Someone1234 · on April 6, 2015

I won't comment on Mercurial because I don't know anything about it.

I really do feel like Git is appropriate for small projects. The .git store is incredibly small (because it only stores changes) and quite simple when you get used to it. You will only only need to use a tiny subset of Git's functionality (or, heck, just the Github web-UI if you wish).

I think the hardest thing about Git for big and small teams is getting used to the mental model of Git (branching, merging, master, etc). Once you get it down you realise that Git is less complex, not more...

PS - This comment is mostly in reply to the implication that Git is a better tool for large and complex projects like the Linux Kernel.

PPS - Mercurial might be a better tool yet still for small projects, I don't know enough about Mercurial to have a qualified opinion either way.

acveilleux · on April 6, 2015

For all intent and purpose, git and hg are fairly equivalent. In day to day usage, the two big differences are:

1. Local branches

Git has them, mercurial doesn't really. Mercurial wants you to use a lightweight clone (copy-on-write with hard links) instead. Lots of mercurial extensions deal with this however.

The lack of rebase in stock hg is probably directly driven by the lack of local branches that don't get pushed upstream by default.

2. Committing code

Mercurial, without extensions, doesn't have the git shelf. With extensions there's a few competing options like MQ, Shelve or Evolve.

Both these points make hg easier to learn (branch == clone is very easy to wrap your head around, commit behaves basically like svn/cvs).

imakesnowflakes · on April 6, 2015

>Mercurial wants you to use a lightweight clone....

You can make a named branch and make it private so that it wont be pushed automatically or you can create a bookmark and mark the changeset as private, so that it wont get pushed automatically OR you can create a local clone, completely isolated from the origin repo.

The difference between git branches and Mercurial branches is a bit of a conceptual one. Git has 'user' branches, where a branch corresponds to the work of a user. Mercurial has 'feature' branches, where each named branch (is supposed to) correspond to a feature. I think This is the one of the reasons some Git users are sometimes so turned off when they see Mercurial branching model. And to me, I would like a project as a sum of its feature branches. But I can imagine why this might not be optimal for a something like Linux kernel. But for most projects, I think feature branches are more valuable than the user branchs of git.

>Mercurial, without extensions, doesn't have the git shelf.

Shelve extension is bundled, enabling is just a matter of turning on a flag in the .hgrc configuration file...

stormbrew · on April 6, 2015

> Git has 'user' branches, where a branch corresponds to the work of a user.

Er. Not really, no. Git is simply agnostic about how you use branches, and allows you to separate how published branches and your local branches are used implicitly. I've rarely worked on a multi-user git project where published branches corresponded to a user's work. Almost always they are feature branches worked on alone or with someone else.

Even in linux development, where the repositories are directly tied to users usually, the actual branches generally represent a work-unit of some sort, so I'm not sure where you got this idea.

imakesnowflakes · on April 7, 2015

This is where I got the idea,

http://fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.wiki

Please see the Table on the top and the 'Branches' section.

stormbrew · on April 7, 2015

Ok. I think that is a relatively accurate description of some normal git processes, but I don't think that's saying what you're getting from it.

You may be taking too strong a meaning of the 'ownership' concepts described there (and I think they're stated too strongly). No one owns a branch name in git, not even the person who first makes it. You own the branches in your own local repository, and you can choose how and if you share them, but it is perfectly possible to do shared work on a branch if you have a shared repo to work from (and even, with some extra work, without one).

This is basically a Zooko's triangle[1] problem, where git has chosen decentralized and meaningful, which necessarily implies a rejection of ownership (even to the extent that Fossil wiki page implies).

But in the end, most branches in git are focused on a piece of work and not who made them (it's mostly the 'lieutenants' in the linux model who would have personal branches). Single-author branches may be more common in git because it definitely does make those easier than other possibilities, but there is nothing stopping collaboration on a shared branch in git.

[1] http://en.wikipedia.org/wiki/Zooko%27s_triangle

acveilleux · on April 6, 2015

It's probably pervasive because of github and pull requests where, at least for open-source projects and external contributors, it's true in practice though not enforced.

stormbrew · on April 6, 2015

I'm confused as to what "it" means here?

If you mean the idea in the GGP's post that branches represent a user's work, again I think that conflates repositories with branches. It's not exactly impossible to find people doing work on master of their fork against an OSS project, but best practices (and common practice among most regular contributors I've seen) seem to angle towards using feature branches even then (and asking with a PR to merge eg. stormbrew/blah:fix-the-thing to upstream/blah:master).

johnmaguire · on April 7, 2015

Even in that case, you're talking about a clone of the repository, not a "user's branch." That would be the "user's repository", which has branches corresponding to features, and bugfixes (unless they are a user on Github who isn't too familiar with Git, in which case they'll just commit to their "master" branch probably). If each user has their own fork, as is typical on Github, then it makes even less sense to have a "user" branch.

ngoldbaum · on April 6, 2015

For several years now Mercurial has had bookmarks in core, which are analogous to git's local branches.

It's also worth pointing out that many of the extensions you referred to (rebase, shelve) are bundled with core Mercurial and need only a one-line addition to a configuration file to turn them on. Extensions bundled with core Mercurial have the same level support and backward compatibility guarantees as the rest of Mercurial. Part of Mercurial's command line philosophy is to not include obvious foot-guns (like the ability to rewrite history) in the default version.

tacticus · on April 7, 2015

And for several years bookmarks have been unfortunately missing in unfortunately common linux distributions.

ngoldbaum · on April 7, 2015

It's quite easy to build mercurial from source. That said I understand the pain of being on locked down ancient linux installs with no access to compilers.

rbehrends · on April 7, 2015

> 1. Local branches

Local branches are really an artifact of Git having a garbage collector. That means that any commit that you want to keep must be attached to a branch so that it may not be inadvertently lost. Without a garbage collector, that is not necessary.

In particular, in many other VCSs, you just simply commit without having to create a branch in the first place. Labeling them (via whichever mechanisms your VCS provides) is typically optional with modern version control systems.

You can argue that you prefer having a garbage collector over having to explicitly delete commits/branches that you do not want any longer, but that would be a different issue.

> 2. Mercurial, without extensions, doesn't have the git shelf.

This is sort of incorrect -- or rather, probably a misunderstanding. Shelve is part of standard Mercurial and has been for a while (same goes for MQ). While it is disabled by default and enabled in the extensions section, it's no more an extension in any meaningful sense of the word than git stash (which technically calls out to a shell script [1]). Mercurial's shelf is a python module [2] that is loaded when you flip a switch.

> Both these points make hg easier to learn (branch == clone is very easy to wrap your head around, commit behaves basically like svn/cvs).

This is really more a feature of Bazaar, not Mercurial.

[1] https://github.com/git/git/blob/master/git-stash.sh

[2] http://selenic.com/repo/hg-stable/file/default/hgext/shelve....

saraid216 · on April 7, 2015

> In particular, in many other VCSs, you just simply commit without having to create a branch in the first place. Labeling them (via whichever mechanisms your VCS provides) is typically optional with modern version control systems.

Since, for me, local branches are a massive part of why I prefer git, I'd be interested in examples of how to do the same thing in other VCSs.

It doesn't make sense to me that they're a byproduct of having a garbage collector, since (1) git-stash exists, (2) local branches are a common selling point, and (3) git-clean honestly seems tacked on rather than a core piece of the design.

Blackthorn · on April 7, 2015

For Mercurial, you create a new local branch by committing at a point that's not a tip (or not HEAD, if there's just a single branch).

By default, it doesn't have a special name (like a git branch does). You can give it a name via "hg bookmark".

The feature set is honestly fairly similar, it's just different in the way it is presented. In git you go into a "detached HEAD" and have to explicitly name a branch or commits will get lost. In hg you commit and it creates a new tip, but it doesn't have a name unless you explicitly name it via bookmark.

josteink · on April 7, 2015

> In git you go into a "detached HEAD" and have to explicitly name a branch or commits will get lost.

Maybe I'm missing something, but this seems like a fairly academical scenario.

When would you checkout a specific commit, keep working, keep committing, but not attach it to a name or a branch? How is "67846237864237846" a nicer reference than "bugfix" or whatever?

To me this seems like people using git in a way it's specifically designed not to support and then go complaining when things go wrong.

imakesnowflakes · on April 7, 2015

>When would you checkout a specific commit, keep working, keep committing, but not attach it to a name or a branch? How is "67846237864237846" a nicer reference than "bugfix" or whatever?

You enter a commit message when you make a commit, right? So I am not sure I see your point..

josteink · on April 7, 2015

I don't think you understand my dilemma. Let me try to clear it up with an example. Let's say you have the following history:

    - Commit 1 -> Commit 2 -> Commit 3 (master/default) - current branch

Now Commit 3 is the HEAD of your current branch (master, trunk, default, whatever). The way I understand your "problem" is if you want to create a new branch based on an older, non-HEAD commit.

For instance you checkout the current branch on based on the state of the branch as per "Commit 2", and make a new Commit "Commit 4". This creates a new implicit branch:

    - Commit 1 -> Commit 2 +-> Commit 3 (master/default)
                           +-> Commit 4 (implicit branch) - new current branch

Now you have two branches. How do you switch between them? How do you operate on those branches? Do you cycle them? Do you have to remember the last commit-message of a branch to find it again?

To be absolutely clear: What I'm curious about is how do you relate to and navigate between branches when they are all anonymous? What mechanisms do you have to aid identification?

ezquerra · on April 7, 2015

It is actually pretty simple. Mercurial automatically assigns a _local_, sequential revision number to every commit in your repository clone. In fact those revision numbers almost match the numbers in your examples (Commit 1 would have revision number 0, Commit 2 would be revision number 1, etc).

So in your example you would do:

"hg update 2" to checkout Commit 3, and "hg update 3" to checkout Commit 4

Note that these revision numbers are _local_ to your repository. That is, another clone of the same repository may have different revision numbers assigned to different revisions. The revision numbers are assigned in the order in which commits have been added to that particular clone.

You could of course also use the revision id (i.e. the SHA1 hash) to identify them. You do not need to use the whole revision id, just the part that makes it unique, usually the first few (e.g. 6-8) characters of the hash.

In addition Mercurial has a very rich mini-language to query and identify revisions in your repository. These queries are called "revsets". Most mercurial commands accept revsets wherever they would need to receive a revision identifier. With it you can identify revisions in many ways such as by date, by belonging to a given branch, commited by a certain author, containing a certain file, etc (and any combination of those and many others).

Finally, if you use a good mercurial GUI (such as TortoiseHg) the whole thing is moot because the GUI will show you the whole DAG and you can just click on the revision that you want and click the "Update" button.

I actually find the ability to create these "anonymous" branches really useful. Naming things is hard so I find that coming up with a name for each new short lived branch is annoying.

imakesnowflakes · on April 7, 2015

Anonymous heads are often short lived. You either merge them right away (Making them not a head), or bookmark it (Making them not anonymous) so that you can come back to it later.

But say, you forget to merge it or didn't bookmark it, Mercurial does not hide the revision from you. The hg log will list it along with all the other commits.

Even easier is to use the hg heads command, which will show all branch heads for you (Branch heads are change sets that have no descendants on the same branch), along with the commit summaries. You can use the commit summary to select the revision and use a revision id (You don't have to use the full length hash, instead you can use a 4 or 5 character prefix of the revision id) to switch to that commit.

josteink · on April 7, 2015

I think it's funny how you say hg is much easier and git has a much more complex model, and I'm still struggling to understand all this.

To me Git is clear and simple. It's a linked commit-graph, and branches and tags are just references to leaf nodes on the graph. That's all. That's the entire model.

With hg you have local commit IDs, you have revsets, you have bookmarks, and then since you aren't forced to name a graph/branch you have to remember commit-IDs or who did what when to what file. Some commits show up on the commit-log, but you can't be sure if it's in the current branch. I'm totally confused as to what to use where and how I can be certain about what I'm working with.

I'm not saying hg is bad per se, but looking at it from the outside it looks helluva more complex than Git. I think this all boils down to what you're used to.

From the discussion in this thread here, I'm pretty certain I wont be trying mercurial anytime soon. It just seems too complex and confusing for me ;)

imakesnowflakes · on April 7, 2015

>To me Git is clear and simple. It's a linked commit-graph..

Same in Mercurial. But it lets you store a bit more meta data (Bookmarks, branch names, local revision numbers etc) which enables more powerful queries based on them. I am not sure that is a bad thing. But I can certainly understand why it appears complex to when presented with all these information at once.

But trust me, when you are trying to make sense of what went wrong with a merge done by an in experienced co-worker, you want all the metadata you can get..

>and then since you aren't forced to name a graph/branch you have to remember commit-IDs or who did what when to what file. Some commits show up on the commit-log, but you can't be sure if it's in the current branch.

Now you are talking FUD, no offence.

>you have to remember commit-IDs...

??, I just told you how to manage such cases. Which part you didn't get? In Git you would have to dig up reflogs for the commit hash for a' DETACHED HEAD' that you just made, Right?

>Some commits show up on the commit-log, but you can't be sure if it's in the current branch.

Not sure what you mean by there or where you got that idea. Every commit you make will be displayed by the hg log, regardless of the branch it is on. If you want to filter log by a branch, there is a -b option accepts branch name as an argument, that you can use to limit the display to changesets in that branch only.

>From the discussion in this thread here, I'm pretty certain I wont be trying mercurial anytime soon. It just seems too complex and confusing for me ;)

If Mercurial was actually 'complex and confusing' it would have been dead a long time ago (Thanks to git). The simple fact that it survives (and even grows in popularity) despite git, IMHO is a sure sign of how simple/straightforward it actually is for common things. So do yourself a favor and try it sometimes. You just might like it.

Blackthorn · on April 7, 2015

> Maybe I'm missing something, but this seems like a fairly academical scenario.

No, you aren't missing anything. I merely used the example to illustrate the difference between the two VCS's. Like I said: they actually have a fairly equivalent set of features. The prime user-visible difference is in interface and implementation.

baq · on April 7, 2015

this happens quite often in e.g. gerrit workflows, where you usually have to amend a change or if you want to take a look at somebody else's tree.

rbehrends · on April 7, 2015

> Since, for me, local branches are a massive part of why I prefer git, I'd be interested in examples of how to do the same thing in other VCSs.

You simply commit a new changeset based on the desired parent. The branch would be created implicitly by having a revision with two or more children. It does not have to carry a name; naming revisions (or sets of revisions) is an orthogonal concept for the purpose of navigating the revision DAG [1], whereas in Git branches have the additional purpose of keeping revisions from being GCed. Any revision that is not referenced by a ref (directly or indirectly) will eventually be GCed (once it is removed from the reflog and the grace period has expired).

> It doesn't make sense to me that they're a byproduct of having a garbage collector

It is a byproduct. Because if you did the above in Git (which Git won't let you without being explicit about it, but it's possible), then one or the other child would get GCed sooner or later. That's because there's only one HEAD commit, and that can prevent only one child from being GCed, so you have to create a ref to the other (generally via a branch) to keep it alive.

> since (1) git-stash exists [...] and (3) git-clean honestly seems tacked on rather than a core piece of the design.

This has nothing to do with git stash or git clean. I'm talking about one of Git's architectural principles, where the repository is essentially a purely [2] functional data structure and revisions that are not reachable from one of the roots (refs/branches) are subject to garbage collection.

[1] And if you look (aside from Mercurial) at Bazaar, Fossil, Monotone, or Veracity, you'll find that there are multiple ways to go about that.

[2] Well, not entirely pure, but close enough.

josteink · on April 7, 2015

> You simply commit a new changeset based on the desired parent. The branch would be created implicitly by having a revision with two or more children.

And then, how do you later track these commits? How do you change or merge branches if they have no names? Do you go around remembering revision-IDs?

I'm not trying to be judgemental and say this is wrong or anything, but I'm curious how this works out in practice.

rbehrends · on April 7, 2015

First, let me note that neither Git nor Mercurial is my preferred VCS, so you'd ideally want to ask somebody else about common practices (my own preferences are naturally influenced by other VCSs).

My own preferred approach is to use hg share, which allows me to have multiple checkouts of the same repository, each with a different current revision (HEAD in Git parlance). This is because when I'm doing some local branching, I generally like to have two different sets of files to work with (so as not having to rebuild when switching back and forth, being able to easily run side-by-side comparisons, etc.). This is sort of like Git's git-new-workdir, except that git-new-workdir isn't safe; it is also the default mode of operation of pretty much any DVCS other than Git and Mercurial. This, I note, is not necessarily what the majority of Mercurial users do -- especially since this feature first has to be switched on --, but is one of the major reasons why, if I have to pick either Git or Mercurial, I'll generally go with Mercurial. Multiple checkouts are simply too important a feature for me to want to deal with Git's workarounds; Mercurial's support of them is imperfect, but still better than what Git has.

More commonly, in Mercurial you will simply not need to name temporary branches. You will have named branches for your major features etc., and you can use hg heads -r <branch> to list the 1-3 heads you may at any time have per major branch. This makes naming the heads sort of superfluous for typical development. Also, if you have only two heads in a branch, hg merge and hg rebase will automatically pick the other head to merge/rebase against.

Alternatively, you can use bookmarks. Bookmarks are essentially like Git branches in that they are pointers to commits.

Finally, there's also nothing inherently wrong with simply using named branches, even for temporary work. It's up to you or your organization whether you want persistent or temporary names for temporary work.

imakesnowflakes · on April 7, 2015

>whereas in Git branches have the additional purpose of keeping revisions from being GCed..

Not only that, but it is also used to decide what all commits to show to the user. In that way branch names act as little windows through which you can look at the DAG of history. Commits that you cannot see through this window does not really belong in the history according to git, and will be GC'ed.

So if you make a commit on top of a revision that is not a branch head, then it becomes a DETACHED HEAD. Actually there is nothing 'detached' about it. It is linked to the parent commit just fine. But git just does not consider it part of the history unless you give it a name. Here you can see the problem with git naming. 'DETACHED HEAD'. If you understand a head revision as a last revision in a sequence of connected commits, how can there be a 'DETACHED HEAD' when a 'HEAD', by definition is part of a sequence of commits..

I wonder why someone would call it a 'DETACHED HEAD' instead of something like 'UNNAMED HEAD' or 'ANONYMOUS HEAD'....

_ikke_ · on April 7, 2015

Because HEAD is in this case not 'attached' to a branch, which means there is no branch which can be advanced to keep track of the new commit.

Aga · on April 6, 2015

For me Git has also worked very well for small projects. I actually learned it initially by using it for my masters thesis.

One correction: Git does not store changes at all. It stores all the versions of files as separate "blobs". What keeps .git small, is that Git compresses the object database to pack-files. "git gc" does this for you.

Pro Git has a great chapter about how Gits internals work: https://git-scm.herokuapp.com/book/en/v2/Git-Internals-Plumb...

This picture sums the model nicely: https://git-scm.herokuapp.com/book/en/v2/book/10-git-interna...

For me studying Git internals has been a good lesson in software design. Git's core is very simple, yet somehow that makes it a really versatile tool.

LukeShu · on April 6, 2015

    > The .git store is incredibly small (because it only
    > stores changes)

git stores a complete copy of each revision, not changesets. (Well, it is my understanding that a .pack compressed archive of objects may store some of the objects as changesets, but that's not the general case)

Someone1234 · on April 6, 2015

It stores a complete copy of the file, not a complete copy of the entire repository. So it keeps your .git fairly light weight since only changed files have duplicate copies.

Essentially it is not line level, but file level, rather than complete repository level (e.g. like Team Foundation is).

josteink · on April 7, 2015

> So it keeps your .git fairly light weight (... compared to ...) Team Foundation

And you can tell from how the product is being used. Where I work, we use TFS and you literally need to have committee meetings before a new branch is made.

You need to agree on long term strategies, you need to agree on branch hierarchy (because TFS cannot merge without one), you need to replicate all-the-things(tm) for your new branch, you need to adjust your versioning-strategy for this branch, and possibly future branches which may come and may not ever collide with this one, because TFS branches are massive and carved in stone. They are forever. Etc etc.

You never branch more than once a year. In the world of TFS, doing such would be madness, haphazard. Or at least you would need a full team doing nothing but branching and merging and integration. And good luck trying to find qualified people willing to have that as their day-job.

With git you branch every single time it makes sense, usually several times a day. And your life is nicer because of it.

It's funny how such technical decisions results in such huge and enormous visible changes to how a product is being used.

Somewhere deep in Microsoft there are probably one or more engineers crying right now seeing the effects of their (back then) seemingly innocent decision, as they themselves are forced to deal with TFS in their daily work and they see how nice the gitters have it.

stormbrew · on April 8, 2015

Internally Microsoft largely uses a fork of Perforce called SourceDepot. Branching in P4 is about as easy and lightweight as any centralized version control system has ever managed, though still quite a bit heavier than git obviously.

cben · on April 12, 2015

When I worked in Google a couple years ago, they were using Perforce and branching was extremely hard (it was P4 with extensive internal tooling around/underneath it, no idea how representative it was of normal P4).

It took me about a year to realize how bad it really is: eventually I found my way to a "how to create a dev branch" document starting with:

    "Last year Dev Wossname wrote a useful mail showing how to create a dev branch and got most of the details right :-)  This howto explains it in more detail and fixes a few points."
    Then followed a quite few pages with a dozen manual actions doing somewhat intimate things to levels of P4 I had not previously heard of...
    Oh, and it created the branch at the *company-wide root level*, e.g. `gmail/` -> `gmail_dev_branch_2012/`.  No pressure !-)

It then took a few more days until the horror of that opening sentence sunk in. Native large-scale branching was so rare that it was almost a lost art!

[There were also several "lightweight" branch-like techniques based on symlink magic, FUSE filesystems and more magic. Those were routinely used for freezing a release build, with option to hot-patch it — but IIRC it was impossible to merge back onto the main tree.]

In practice things weren't that bad because (A) Google got really good flow (especially continuous testing) for everybody working close to HEAD, (B) personal branches were easy and common:

- Fresh checkout directories were cheap [should have been frightfully expensive but clever scripts and a FUSE filesystem fixed that]. - Every CL (commit progressing through code review iterations) was effectively a feature branch. - People who wanted more used a somewhat crazy, but very workable, internal tool mirroring a growable {time,space} subset of P4 into local a git repo and back onto P4.

stormbrew · on April 14, 2015

My P4 is very rusty, but back when I used a vanilla install a branch was basically just a copy operation from one part of the namespace to another, and then setting up your view to point to that new spot (or setting up a separate view for it if you wanted). This was before any integration with git (or, indeed, the existence of git at all), as well.

Subversion mostly copied the way p4 did it, but without the view control stuff (where there's a server-registered and quasi-versioned description of what local file paths map to what server paths) that perforce has/had.

cben · on April 15, 2015

Hmm, that doens't sound bad. I don't remember (nor fully understood) why it was that complicated at google. At a wild guess it's ultimately connected to google's choice to keep all company code in a single humongous repo, which required a lot of later creativity to keep P4 from collapsing under the weight...

stormbrew · on April 16, 2015

I would think that the amount of data wouldn't be the thing that would make p4 fall over (it's pretty decent at that), but the number of users. P4, iirc, stores all client state server side, even to what files you're editing. So if you had to work around anything, it was probably that, and I can definitely see how that would break the ease of the workflow.

stormbrew · on April 6, 2015

When git repacks[1], which it does periodically (but not on every commit), it will store objects in as compact a form that it can, which basically means that it looks for root objects that it can delta a given object against so it stores each chunk of text as few times as possible.

What it doesn't do is store files as deltas explicitly against the previous or next revision of that file. It attempts to find a best fit dynamically.

It also does something like this packing procedure when responding to a fetch to avoid sending more data than it needs to, btw.

[1] https://www.kernel.org/pub/software/scm/git/docs/git-repack....

jebblue · on April 7, 2015

Git rocks for small to medium sized projects, never tried it on a large one but Linus made it for the kernel so that would not worry me either.

Going into a directory and easily creating a repo there, where I need it, is great.

Canonical did something similar with their Bazaar tool.

rqebmm · on April 6, 2015

Where I land is that mercurial is a better tool, but git is a better platform. As a result, better tools have been built on top of git than mercurial (like github). As a result, git at some point became the standard.

jebblue · on April 7, 2015

I'd use Mercurial but its name has too many syllables.

henrik_w · on April 6, 2015

"The trick wasn't really so much the coding but coming up with how it organizes the data."

I think this really is the key to understanding git as well. When you understand the git data structures, git makes sense. Otherwise, it can be quite difficult to grasp (http://ftp.newartisans.com/pub/git.from.bottom.up.pdf was really useful for me).

krylon · on April 6, 2015

Not only git. There is a nice quote from Fred Brooks Jr. (I think), that if you show me your flow charts and algorithms, I remain as clueless as before, but show me your data structures, and everything else will follow. (I am paraphrasing this from memory, Brooks put it more eloquently.)

henrik_w · on April 6, 2015

Yes! I love that quote - so true. Linus has the same opinion:

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." [1]

I actually refered to this when discussing switching from Java to Python. One of the biggest problems for me was not easily being able to see the types of the arguments to a function in Python. [2]

[1] http://programmers.stackexchange.com/questions/163185/torval...

[2] http://henrikwarne.com/2014/06/22/switching-from-java-to-pyt...

AceJohnny2 · on April 6, 2015

> not easily being able to see the types of the arguments to a function in Python. [2]

That's because of Duck Typing [1]. I really like Python for how quick and easy it is to make small pieces of software, but I fully agree that Duck Typing can become more of a hindrance on larger projects.

[1] http://en.wikipedia.org/wiki/Duck_typing

MichaelGG · on April 6, 2015

Nothing about duck typing requires a python-like type system. F# has duck typing via inlining (hacky sorta), C# has it too, but only for compiler use, not end user code (because why build generic features, when you can hack one-offs into the compiler).

It'd be great to have more static duck typing. I suppose that's the point of traits and mixins, to some extent. But it'd be nice to have on-the-fly constraints created and enforced.

CountSessine · on April 6, 2015

I thought that F# used type-inferencing? If not, then its not nearly as similar to Haskell and OCaml as I've been led to believe.

Unless you're referring to type-inferencing when you use the term static duck-typing. The term duck-typing was never meant to refer to type-inferencing - duck-typing is type-opacity right into the runtime. The whole point of duck-typing is that objects can be referred to generically even in cases where their underlying type would be undecidable.

Type-inferencing gives you some of the source-level flexibility of not having to refer directly to an object's type much like duck-typing, but without many of the crazy runtime disasters that you can get with duck-typing.

Nevor · on April 7, 2015

Parent comment is referencing the "let inline" construction that in addition to inline code allows to constrain generic types structurally over member definition[1] instead of by name as usual. This gives us poor's man type classes which is useful nonetheless.

I guess the conclusion is that the definition of duck typing is fuzzy and misleading.

[1] https://msdn.microsoft.com/en-us/library/dd548046.aspx

rjbwork · on April 6, 2015

C# Actually does have duck typing via the DLR (Dynamic Language Runtime) now. You can totally define a method as

    public dynamic DoDynamicStuff(dynamic wooDynamix)
    {
        Console.WriteLine(wooDynamix.DynamicProperty1);
    }

MichaelGG · on April 7, 2015

Right but it doesn't have the static kind the compiler uses for foreach loops or dictionary initializers.

squeaky-clean · on April 6, 2015

Off topic from the article, but this is something I literally just spent my morning wrestling with and fixing, so I'd like to talk/rant about it a little. My apologies in advance for rambling.

You can use doc comments to signify types. You mention you use PyCharm in your article, and it supports doc comments in it's autocomplete and code analysis, I find most popular libraries are commented well enough that PyCharm can understand them.

I also always try (and encourage others) to use names that are declarative for both purpose and type. Of course, you can't always rely on third party libraries (or even colleagues) to be so nice, but I find a good name (almost) always removes all the confusion that normally comes from a lack of static typing.

For example, one of Codecademy's very first Python lessons includes some code like this:

    meal = 44.50
    tax = 0.0675
    tip = 0.15

Which I think is unclear. Imagine these were function arguments. calc_total(meal, tax, tip) is vague. Is meal an object, or a numeric value? (I could possibly see it containing a list of all items in the meal with prices). Tax is almost always a percentage, but what about tip? Judging by the the above values, we can assume a 15% tip, but we don't know if the customer was stingy and tipped $0.15, and by just name alone, we can't tell at all.

    calc_total(meal_cost, tax_percent, tip_percent)

It's now immediately clear to me the type and range of values it accepts. The tip_percent is also an example of when a good name can provide info that even static typing could not, because in either case it is a floating point (please let's not get into a debate about Decimal or currency types :P ). This is a very basic example, but it applies at all levels. Don't call the parameter "users" if the function is not expecting an iterable of User objects. Maybe "usernames" would be better. Etc.

But of course, this only helps if the code you're working with is named well. In something like Java, you have better protection and tools when working alongside lower quality code. I also completely agree with you on point #2 about No Static Types. Navigating through my editor is so much easier in a static language than it is in PyCharm with large projects. And the most annoying thing is that autocomplete breaks with ORMs, and most ORM usage is actually flagged as a warning or error. Ugh.

I also feel very confident in the automated refactoring in something such as an IntelliJ Java project, or ReSharper, but am apprehensive about using PyCharm to refactor anything with usages spanning more than one file. Same goes for Javascript (or any other dynamic language, I suppose. Those are just the two I use).

Enjoyed the posts by the way, adding your blog to my reading list.

S4M · on April 6, 2015

The example you mentioned, translated in Java, would be:

    Double mean = 55.50;
    Double tax = 0.0675;
    Double tip = 0.15;

And the function declaration would be:

    Double calc_total(Double meal, Double tax, Double tip)

So even with the types, you have exactly the same problem as in python, and creating a special class to wrap up percentages is something I find quite heavy.

Your alternatives would be to have naming conventions, like you do, or some proper documentation that tells you what the function expects for its arguments, with an example of a function call, which I particularly like in python because of the REPL.

vilhelm_s · on April 6, 2015

But perhaps a better type system can still be helpful. E.g. F# supports units of measure, so you can declare 'dollar' as a unit, and write

    let calc_total (meal_cost : float<dollar>) (tax : float) (tip : float) =
       meal_cost * (1.0+tax) * (1.0+tip)

This way it is at least clear from the types that the numbers are meant to be multiplied, not added.

S4M · on April 6, 2015

I never used F#. In your example, would the code:

    meal_cost + tax

trigger a compiler exception?

vilhelm_s · on April 6, 2015

Yes, you get

    The type 'float' does not match the type 'float<dollar>'

There is an online typechecker, so one can try out F# without installing it: http://www.tryfsharp.org/Learn/scientific-computing#units-of...

S4M · on April 6, 2015

And how about:

    meal_cost * tax

?

This one makes sense in the real world, so I suppose an advanced type system allows the programmer to specify what operations are legit or not across different types.

vilhelm_s · on April 6, 2015

One could imagine systems for that, but units of measure doesn't do any customization of different operations. It's just unit checking for for arithmetic, like in high-school math.

squeaky-clean · on April 6, 2015

> So even with the types, you have exactly the same problem as in python, and creating a special class to wrap up percentages is something I find quite heavy.

Yep, I agree completely and was hoping to make that point. The typing makes it clear that meal is going to be a single price, and that tip won't be a String or something unexpected, but it is still unclear whether it is a percentage or a amount.

I guess what I mean to say is that while static typing certainly helps, it isn't the be-all and end-all of code clarity, and descriptive names can really help code readability regardless of your typing system.

Also, admittedly the example I provided is rather simple. And in any language, you should be using a proper Currency type or library when dealing with money, which would make most of my argument moot:

    Currency calc_total(Currency meal, Double tax, Double tip)

And now specifying tax and tip as percentages isn't necessary. But not all examples are so friendly!

mgkimsal · on April 7, 2015

reading "tip_percent" wouldn't tell me how to use it. I've seen people passing in 15 and 0.15 to signify 15% - it depends on how you think about it.

Float tip_percent

might be more of a tip off, but probably not even then.

mayoff · on April 6, 2015

I think the Git Parable is an excellent explanation of the underlying git model.

http://tom.preston-werner.com/2009/05/19/the-git-parable.htm...

drakenot · on April 6, 2015

That link didn't work for me (Page not found) but I found a mirror: https://github.com/umd-byob/presentations/raw/master/2014/03...

agumonkey · on April 6, 2015

This tutorial did help me getting git basic data and operations a lot. https://codewords.recurse.com/issues/two/git-from-the-inside...

kbart · on April 7, 2015

Great read! Now some git commands make much more sense.

thomasfl · on April 6, 2015

I wonder if BitKeeper owner Larry McVoy has ever regretted not open sourcing his software? Git tools is a whole industry now. The most promiment one, github, recently become one of the 100 most popular websites on the planet.

luckydude · on April 6, 2015

Regret it? Sure. I'd do it in a heartbeat if I could figure out how to make it work. Still would and there is plenty in BK that Git doesn't have. Like submodules that actually work exactly like a monolithic tree, just lets you clone what you need.

But we've never figured out how to make it work financially. If anyone has any ideas I'm all ears (though pointing at github and saying "do that" isn't an idea that I can execute).

BTW, BK used to be pretty darned close to open source, you got the source code under a funky license that said "don't take out the part that lets us make money". We stopped shipping the source when we learned that the very first thing that someone committed to the repo was taking out the part that let us make money.

qzw · on April 6, 2015

Very cool of you to share your thoughts. I sympathize with your dilemma. It seems that the people who end up making money out of free/open source software are often not the ones who write the code. And I remember reading back in the day that Linus would talk with you extensively about the nuts and bolts of DVCSs, so I'm sure all git users owe you some gratitude for inspiring him to create git and getting the fundamentals right from the get go.

Out of curiosity, and please feel free not to answer, is BK still a viable commercial product bringing in significant revenue? And what obstacle do you see with going the platform/service route like github? I assume that's something you've seriously considered, even without open sourcing BK.

luckydude · on April 7, 2015

Yeah, BK still pays the bills for our team. We're small though, I recently found out that perforce has around 250 people, we're less than 1/10th that. But we pull in millions a year, enough to pay our people above scale even in the bay area, so far, so good.

I'll admit we've fallen off the radar (well, we were never really on the commercial radar, the only "marketing" we ever did was getting Linus to use it and that wasn't intended as marketing, it was intended to keep the kernel from diverging like the BSDs did. But it turned out to be a form of marketing that has kept us alive).

We're gonna try some actual marketing. Stay tuned. We'll probably screw it up :) But we hired a marketing company, I've gone back to writing papers, we'll give it a try. If you have ideas on how we can put ourselves back out there, we'd love to hear them.

As for viable, heck yeah. We work well on big repos (better than git), we've got what we call nested collections of repositories (did I mention I suck at marketing, yeah, I came up with what to call it) that are sort of like submodules except they work exactly like a single repo, sideways pulls work, anything that works with one repo works with N repos, that includes all the guis, command line, everything. We've got an answer for binaries that works for gaming companies. We've got a sane user interface (that's what Mercurial copied, in a somewhat sketchy way).

Git is sort of like the wild west, it never met an idea it didn't want to implement (at least partially). We're more enterprise ready (yeah, over used term) in that we work hard to make sure that BK has all the guard rails, seat belts, etc, so that you can deploy to people who could care less how any SCM works and they don't drive themselves over a cliff. Definitely less cool than git in that we take away some (bad) options, but safer.

We have seriously considered open sourcing a version of BK. We've been doing a lot of performance work and we essentially have two BK's, the almost SCCS compat ascii format slow version (slow but any version of BK will talk to it), and the fast one with a new binary file format (stuff like show the top commit comments are 35x faster in the linux kernel, that number goes up as you add more csets). We considered open sourcing the slow one but that effort has stalled. It could be revived, it just has to be worth it to us.

The github ship, in my opinion, has sailed. Maybe we could have open sourced BK back before git and done a github thing but it's all flashy UI stuff and we sort of suck at that. We're really good at systems stuff (you'll see when we start doing marketing, we scale, git doesn't) but flashy? Not so much. We do our UI in tcl/tk (I know, I know, but we have one UI person who makes it all work on windows/linux/macos and tcl/tk is a big part of that. At least we wrote a C like language that compiles to tcl byte codes so we're out of tcl. Thank God.)

hyperpallium · on April 7, 2015

Wouldn't the standard open source-as-freemium work? i.e. a free open source version with enough cool features (e.g. nested respositories), but not efficient. It's free marketing to keep you on the radar, that targets the people who appreciate your systems chops (and, like Atlasian, also gets it in under the radar, to developers). Enterprise customers happily pay ridiculous sums for full versions. And git/hg makes you immune to the competitive danger of open source clones.

I'd value your thoughts on this, as I also have a popular open source competitor, that followed me. The strategy seems sensible, but it might undermine perceived value; and it's a hassle to maintain two versions...

Also, can I please ask a technical BK question: How much does git differ from BK internally? i.e. git has graphs of commits, content-addressable for efficient checks of identity and integrity. Did git get any of that from BK? Or was it more the workflow and distributed concept of everyone having a copy of the repository? Many thanks!

luckydude · on April 7, 2015

Linus definitely did his own thing with git. The general ideas came from BK, BK gave you clone/pull/push/commit as the model. Everyone copied that because it just makes sense. The all or nothing clone model came from BK.

How it is all glued together differs quite a bit. BK has the concept of a revisioned file, git does not, it versions trees. That's why Linus thinks renames are silly, he doesn't care about them, he cares about the tree.

The graphs of commits comes straight from BK, that's BK's changeset file - which is sort of neat in that it is a version controlled file itself. BK is the only system that I know of that uses a versioned file to store the metadata.

OK, so on the business model thing, I'm not sure. The way we did the old compatible format is compatible but it's pretty slow, it converts to the new format in memory and then converts back if you write it out. It's slower than the older implementation (but this way we have one in memory format, less bugs). I thought it was good enough for small projects, my team overrode me and said "too slow".

As for enterprise customers "happily paying", um, no. We constantly get wacked with "if you don't do this or that we're moving to git". Which could be viewed as a good thing, we have to keep making it better, but it gets tiresome.

hyperpallium · on April 7, 2015

Thanks! Renames make archeology difficult in git. I've become reluctant to change {file,directory} names, even when it's clearer...

BTW: What are the benefits of versioning changesets themselves? Isn't it rare to only change the changeset?

Chained conversions are elegant, but slower code is unsatisfying... I guess such hobbling is the essence of open-source-as-freemium. :(

I meant they "happily pay" for full over free versions. (For them, it's also paying for "new" features!)

luckydude · on April 8, 2015

Renames are a thing and git made the wrong choice there. It's not like we are perfect but we are way closer.

So on versioning changesets I didn't really explain. Lemme try again.

In any DVCS you have a bill of materials, that's what describes the tree. Git's is different than ours because they don't version files, we do. So our bill of materials looks like:

  path/to/file <version>
  path/to/different_file 1.1
  path/different_dir/a_file 1.19

If you "cat" the changeset file as of any version you get what the tree looks like, a list of files and a list of revisions.

Of course it doesn't work like that because, um, reality and merges and parallel development. We have UUIDs for each file and each version so it looks like

UUID_for_a_file UUID_for_a_version

and our UUIDs are pretty sweet, not sha1 or some other useless thing, they are

user@host|path/to/where/it/was|YYYYMMDDHHMMSS|checksum

those are for each node in the graph, for the very first node which is the UUID for the file, there is a "|<64 bits of /dev/random>" appended.

So the changeset file is just a list of

UUID UUID

Not sure if that helps.

The benefit of versioning the file that holds all that data is we can use BK to ask it stuff. Want to see the history of the repo? bk revtool ChangeSet Want to see what files changed in a commit? bk diffs -r$commit ChangeSet Yeah, we have to process all the UUIDs and turn them into pathnames and revisions but we can do that and do it fast. So it works.

All the tools we built to look at stuff can look at the metadata. That's worked out well.

hyperpallium · on April 9, 2015

Thanks!

anonbanker · on April 6, 2015

  > Sure. I'd do it in a heartbeat if I could figure out how to make it work.

just upload it to github. :)

luckydude · on April 7, 2015

Have an upvote for making me laugh.

It's not as far fetched as you might think though, we've been building million cset trees and git is so darn slow that we made a "bk fast-export" that spits out the stuff that git wants. Because Git was over 20x slower just running the commits.

thomasfl · on April 7, 2015

It's really great to see you answer my question yourself! It seems to be much easier to create a business by using open source software, than writing open source software. Especially for small companies like yours that create complex software like BitKeeper.

ploxiln · on April 6, 2015

BitKeeper is still around, selling to a few large corporations. It's a very different business model.

Git has a large industry of value-add / support companies now. But no one commercial company can own or control the core. And that's one key reason why it's so popular and has such a large ecosystem around it. So to be like git, bitkeeper would have had to really give up central ownership and control. Commercial companies never do that unless they're going out of business anyway, they just can't.

darkmagnus · on April 6, 2015

Here is the first Git commit he talks about in the article:

https://github.com/git/git/tree/e83c5163316f89bfbde7d9ab23ca...

AceJohnny2 · on April 6, 2015

Having read "Git from the Bottom Up" [1], it's interesting and refreshing how concise Torvald's explanation is.

Of course, he also didn't have as much to explain at that stage :)

[1] https://jwiegley.github.io/git-from-the-bottom-up/

ffn · on April 6, 2015

Jesus, so Torvalds built the MVP for git in 1 day and pretty much scaled it up for kernel usage in 10 days. Now, as a humble average developer, how do I achieve Torvaldian levels of productivity?

vinceguidry · on April 6, 2015

Well, he'd been collecting ideas and requirements for a long time in his head before he finally made a system out of it. The vast majority of the time, what you're building isn't all that clear to you before you have to start. So there's a lot of time wasted just figuring that out. Cut that wasted time out and you become a powerhouse.

divs1210 · on April 6, 2015

As someone pointed out, locking yourself in could definitely be a catalyst. How to develop his sensibilities has a very different answer, though - Have High Standards.

    * He aches for the very best tool that will suit all his major  needs. 
    * He mentions getting the data structures right first, which is of prime importance. And no, it doesn't mean you start building TreeFactoryAdapter's.
    * He codes at the lowest level, in C (please don't whine about assembly), to maintain complete control over memory usage.

I shudder to think what he would come up with if he ever decides to take Common Lisp on a test ride.

tikhonj · on April 6, 2015

It was something at the top of his head, enough that he had a coherent mental model of what he wanted to build. The actual coding was less designing git as much as putting an existing design down in executable notation. Couple this with decades of programming experience—enough, crucially, to express his thoughts comfortably, without getting distracted by the incidental minutia of actual programming—and you get feats that seem almost super-human.

To borrow an overused cliché, the code was just the tip of the iceberg. The hidden body of the MVP was the design he had been thinking about for a while. But we, as external observers, can't really see or appreciate what was going on in his head as he planned this out. It doesn't feel like "real" work, which makes the small remaining part that does seem all that much more impressive. This is actually a real source of tension between certain programmers and managers: it's hard for managers to tell actual thinking apart from day-dreaming, so programmers just look lazy in their eyes.

Torvalds wrote about this himself in the interview:

> So I'd like to stress that while it really came together in just about ten days or so (at which point I did my first kernel commit using git), it wasn't like it was some kind of mad dash of coding. The actual amount of that early code is actually fairly small, it all depended on getting the basic ideas right. And that I had been mulling over for a while before the whole project started. I'd seen the problems others had. I'd seen what I wanted to avoid doing.

How do you achieve this level of productivity yourself?

Personally, I think the main quality you want, as I alluded earlier, is comfort. Get yourself to the point where you trust your tools (and yourself) enough to execute your ideas without apprehension. You want to build yourself up to the point where your tools—everything from your programming language to your editor to your libraries—don't get in the way. They should almost feel like extensions of you. Think about it like riding a bicycle or skiing where you start thinking exclusively about where you're going without being distracted by how. Ideally, you want to think about the problem you're solving and the abstraction you're building rather than the mundane details of how you convey this to the computer.

Crito · on April 6, 2015

Work smarter, not harder. The reason he was able to get git up and running so fast is because it is fundamentally a very simple system. If you check out the first commit of git in its own git repo, you can see just how simple it is yourself. In that first commit it is barely anything, yet it was sufficient to write a commit for itself.

cms07 · on April 6, 2015

Spend your better years inside during the 9 months of Finnish Winter.

msvan · on April 6, 2015

I think there's a lot of value in that period of time before the first commit. Rich Hickey supposedly spent two years thinking about Clojure before even writing a single line of code, and I'm sure Torvalds spent an equivalent amount of time seeing pain points with BitKeeper before writing git.

serve_yay · on April 6, 2015

Try working on pretty much only one thing for 20 years and see what that gets you.

worklogin · on April 6, 2015

Linus says of the Github platform -

That's partly because of how the kernel is developed, but part of it was that the GitHub interfaces were actively encouraging bad behavior. Commits done on GitHub had bad commit messages etc, because the web interfaces at GitHub were actively encouraging bad behavior. They did fix some of that, so it probably works better, but it will never be appropriate for something like the Linux kernel.

I haven't ever looked at the kernel workflow, nor have I ever heard systematic criticism of Github's methodology. Does anyone else have input on what Linus may mean by his opinion?

teamhappy · on April 6, 2015

    > [...] nor have I ever heard systematic criticism of Github's methodology.

He's not only criticizing Github:

    > (a) make a real pull request, not the braindamaged crap that github
    >     does when you ask it to request a pull: real explanation, proper email
    >     addresses, proper shortlog, and proper diffstat.
    >
    > (b) since github identities are random, I expect the pull request to
    >     be a signed tag, so that I can verify the identity of the person in
    >     question.

Real explanation, proper shortlog, signed commit, etc. (Note that he didn't mention emojis.)

All he seems to be asking for is that we take our time and do it properly. Makes all the sense in the world for a large, long-lived open source project like Linux. Imagine 20-year-old commit messages from FreeBSD would look like the ones we write today for, say, a npm module.

--- edit ---

Yeah ... nevermind. He explains it himself further down:

    > Btw, Joseph, you're a quality example of why I detest the github
    > interface. For some reason, github has attracted people who have zero
    > taste, don't care about commit logs, and can't be bothered.
    >
    > The fact that I have higher standards then makes people like you make
    > snarky comments, thinking that you are cool.
    >
    > You're a moron.
    >
    >                 Linus

GFK_of_xmaspast · on April 6, 2015

Also who's the "Joseph" he's addressing? Unless there are deleted comments, the only snark is coming from a dude named Jaseem.

Crito · on April 6, 2015

Joseph's comment, which was apparently removed, was "I did not realizes that Linus' shit does not stink. Thanks for clearing that up..."

Here is Linus himself commenting on that exchange: https://lkml.org/lkml/2013/7/16/712

Another source for what was said: https://news.ycombinator.com/item?id=3962978

GFK_of_xmaspast · on April 6, 2015

"You're a moron."

That's our Linus-kun.

rdtsc · on April 7, 2015

Well he was a moron. Linus is Linus-kun because often he is right.

Here is more of the exchange for context:

--- Umm. Notice how the "Joseph" I replied to had deleted all the comments he wrote?

That should tell you something. I smacked down a troll.

If I was polite to you all those years ago, and I was polite but firm in the two first responses, please give me credit for when I smack somebody down. There may be a reason for it. The fact that the person deleted his messages (or github deleted them for him - I have no idea what their comment policy is) and you cannot see that context any more online should not make you think that I suddenly went crazy.

Btw, since I get the github messages in email too, I have a copy. Joseph replied to those "polite but firm" messages where I explained exactly why I don't want to bother with github pull requests with this gem:

   "I did not realizes that Linus' shit does not stink.  Thanks for

clearing that up..."

---

gregkh · on April 6, 2015

The kernel development model is documented here: http://lxr.free-electrons.com/source/Documentation/developme...

Github doesn't scale at all for large projects. The kernel is averaging over 8 changes an hour, 24 hours a day. That rate of change can never be handled by doing pull requests and web site review. It only can work with email and review and scriptable processes.

steveklabnik · on April 7, 2015

We merged 473 PRs in Rust last month: https://github.com/rust-lang/rust/pulse/monthly

That's roughly one every other hour, so still not quite on the same level. And we do use tooling _in addition_ to GitHub, but reviews are done in PR and not over email.

(FWIW I personally wish more projects worked like the kernel's, but such is life)

quantumet · on April 6, 2015

Eh, I mildly disagree on web site review. Perhaps not GitHub's pull request/review model, but big projects like Android and Chromium do all their review on web interfaces.

edejong · on April 6, 2015

Using Gerrit, which is an automated mimick of Linus' development model.

By the way, we've been succesfully using Gerrit on smaller-scale projects as well (around 8 change-requests / day) and don't even want to think about going back to pull-requests.

sunnyps · on April 7, 2015

What you've said is essentially correct, but I'd like to make a minor correction: Chromium uses rietveld whereas Android (and some Chromium related projects) use gerrit. Rietveld and gerrit have similar workflows except that gerrit is more integrated with git (I think). Both are named after Gerrit Rietveld, a famous Dutch designer.

hamstergene · on April 7, 2015

Just because one big project relies on emails does not mean they scale better. They work for the kernel because with that strong organization pretty much anything could work for the kernel.

Criticisms in the interview were essentially about low discipline that Github interface allows, and how they were not willing to adapt to the kernel needs, not about fundamental deficiencies of pull request based model.

So far nothing has been said about why pull requests via a website might handle less load than pull requests over email.

wyldfire · on April 6, 2015

But don't "trusted lieutenants" (yourself included?) mitigate a lot of the challenge posed by that high rate of change?

luckydude · on April 7, 2015

Greg didn't reply but I think the answer is a resounding yes. I wrote this up a bit back when the devs were using BitKeeper. Lemme go find that. OK, ignore the "I want the 1995 era web pages" look and peek at this:

http://www.mcvoy.com/lm/bitmover/lm/talks/socal/

tl;dr: fan in/fan out scales great if you have BitKeeper or Git.

Touche · on April 7, 2015

Can you explain what you mean? Why is email more "scalable" than a web interface?

merqumab · on April 6, 2015

See this 2012 thread https://github.com/torvalds/linux/pull/17#issuecomment-56546...

rentnorove · on April 6, 2015

This pull request: https://github.com/torvalds/linux/pull/17

davexunit · on April 6, 2015

I was so happy to read this. I completely agree with Linus' criticisms of the pull request system that GitHub has popularized. They've deeply embedded really bad habits into programmers. I used to wonder "Why do some projects still do patch review on a mailing list?" and then I realized it was for the reasons that Linus gave here.

mushly · on April 6, 2015

So what's the best way to do it instead? And why is the current model so bad?

jammycakes · on April 7, 2015

One thing worth saying here is that Git is now the dominant player in the source control market. According to almost every survey I've seen, it's overtaken Subversion (according to some reports by a fairly large margin) not only for open source and hobbyists but in the enterprise as well. It's more or less become a lingua franca for communicating source code between teams these days, as well as being the preferred deployment option for many cloud hosting providers and even some package managers. No other source control system has ever managed that.

As such, you simply can't afford not to know it these days. In fact, I can't help thinking that not using Git should now be a major red flag for job candidates and prospective employers alike. Sure, not using your favourite tool may not be that big a deal, but not using what is to all intents and purposes the lingua franca of source code communication is a serious omission.

minusSeven · on April 7, 2015

Can't agree with this assertion. Your source control of choice should not be in any way a red flag. It just a choice and because of learning curve I choose to learn Mercurial instead. I don't see how source control matters to that extent.

If I need to use it I will learn it then. I don't think most people care as such.

blakeyrat · on April 7, 2015

As long as you don't expect people to like it.

restalis · on April 7, 2015

"Why do you think it's been so widely adopted?

Torvalds: I think that many others had been frustrated by all the same issues that made me hate SCM's"

Actually, the adoption was caused by the network effect. It took off with a core group of Linux-involved developers promoting it religiously, then it went on like a fashion (we are git-wielders, we are cool), and ended up being the default choice for a lot of other projects that either depended on some degree on 3rd party git-managed modules, or wanted to benefit from the existing disciplined git-trained developers, or just... because (for all things being equal). Having this presumption that the adopters actually thought for themselves and Torvalds's brain-child got where it is now thanks to its own technical merits is a pleasant thought though.

dangero · on April 6, 2015

Somewhat off topic, but I'd love to see something more git-like that works for large repositories. I work on video games and our repos are several TB. It's not practical to use git in those situations and Perforce feels too clunky to me.

jewel · on April 6, 2015

Have you tried git-annex? It handles large files really well and is as close to git-like as you can get. :)

dangero · on April 6, 2015

I haven't tried it, but I've looked at it. I think it adds too much complexity. Who decides which files go on the annex and which don't? Doesn't seem like an automated great solution for a scale-able team.

I'm looking for something more like CVS/Perforce where you can check anything in, but then with a more Git feeling interface. What I'd really like to do is to look at Git feature by feature and build the closest thing possible that does not include cloning the entire history locally under normal use. I know that is the fundamental paradigm of Git, but it seems that something closer to a hybrid of Git and Perforce could be created. Not sure exactly what it would look like. It's just jarring to jump from Perforce/CVS to Git and I don't think that's completely due to the local repository model. It's how branching works, it's how merging works, etc.

jewel · on April 6, 2015

> I haven't tried it, but I've looked at it. I think it adds too much complexity. Who decides which files go on the annex and which don't? Doesn't seem like an automated great solution for a scale-able team.

I would put all binary files into git annex (as determined by `file`). This can be done by a commit hook automatically. With another hook that makes sure that `git annex pull` is run when the user checks out code, you'd have a solution that was close enough to automatic for most use cases.

(You'd have to help your users with weird situations from time to time, but that's true of git anyway.)

munificent · on April 6, 2015

If I was still doing games, I would be tempted to write to have a hybrid Perforce/git setup where code and text files are in Git and media is in Perforce. It would make atomic commits much harder, but would give you the lightweight user experience (branches, etc.) of Git for code and Perforce's solid performance (and no local history!) for giant assets.

luckydude · on April 6, 2015

We've got an answer for that but fair warning it's commercial, not free.

tl;dr: all the data sits in one or more binary server[s] and is fetched on demand when you ask for the file. We distinguish between text and binary so you can clone a tree and have all the text checked out and none of the binaries; you set up your build system to check those out as needed.

netinstructions · on April 6, 2015

I wrote an article a few months ago summarizing where Git is in 2015 if anyone is curious[1]. Most developers at my company don't seem to want to learn a "new" SCM so we're stuck with SVN (or CVS for some projects).

We rarely have a stable trunk. Branching takes too long so most people don't want to do it. Merging two branches is such a scary thing that people avoid branching in the first place. Or I've seen them copy code from one branch in one Eclipse window and paste into another separate project (on another branch) in another Eclipse window. They repeat this manually for the next 5 to 20 files. I don't think they've realized a merge can often be as simple as a button click or one SCM command. But hey, it works for them.

The one time on my team that someone was interested in exploring Git was when we couldn't find an up-to-date Maven SCM connector for SVN that played nicely with Eclipse. The solution? Use an older version of Eclipse.

[1] http://www.netinstructions.com/the-case-for-git/

praneshp · on April 6, 2015

I work for a huge-internet-corp, which I don;t want to name here (pretty easy to find out from my username and a couple of google searches, if you care). I sympathize (actually empathize) with your problem (developers dont want to learn a new SCM). We had a big drive last year where management stuck down a hammer, and told us to move to git, 100%. Several senior developers (many of them actually architects now, not really writing much code) made a lot of noise on internal mailing lists. One person would make terrible mistakes that one reading of any git manual would help you understand, and then complain loudly (with a lot of swearing) how much it sucked compared to SVN. I was annoyed by this, because except for the hammer thing, the company did everything else perfectly. We have a stable corp github, there were several training sessions offered, and the reasons clearly explained. It was the first time I was happy about the slightly dictatorial approach taken towards the whole thing, instead of trying to reason with 50-year old babies.

serve_yay · on April 6, 2015

Yes, I remember those days. I remember, as Linus says, planning branch merges. It really was terrible. But the solution is to use git (or Mercurial), not shut down the discussion saying you don't want to learn a new tool. That's a very unfortunate attitude for anyone who works in software.

kbart · on April 7, 2015

Heh, I always ask what revision control system company uses. If the answer was SVN (or even worse - CVS), it's usually a deal breaker, because as a rule of thumb, that means company is bureaucracy ridden, slow moving, enterprise behemoth.

jammycakes · on April 7, 2015

In 2015, nobody should still be hiring developers who don't know how to use Git yet. It's a massive warning that they're not keeping their skills up to date.

DigitalSea · on April 7, 2015

I just wanted to say thank you Linus for giving us Git. Before Git I was using SVN and while the wounds are healing, I will bear these SVN scars for many years to come. As a developer Git has made my life so much easier and not only that, thanks to Github I can help collaborate on open source projects as well as my own with ease.

plongeur · on April 6, 2015

99% of the time I use:

- git status

- git add ...

- git reset ...

- git commit ...

- git log ...

- git push/pull ...

I guess that's "the basics" - what command should I learn next?

mbell · on April 6, 2015

Command to learn: - git rebase (if you learn everything you can do with this command, you'll pretty much know git inside and out)

Other useful stuff: - git diff - git diff --cached - git log --graph --decorate --pretty=oneline --abbrev-commit --all - git cherry-pick - git stash

joe_inferno · on April 6, 2015

git bisect is a powerful one i use to identify when a given bug was introduced

icefox · on April 6, 2015

Install git achievements (https://github.com/icefox/git-achievements) and get free internet points to expore them all.

Disclaimer: wrote this a few years ago and it has way more users than I ever would have thought.

abvdasker · on April 6, 2015

I use this command constantly:

http://blog.kfish.org/2010/04/git-lola.html

Being able to see the graph structure made it far easier for me to get a grasp git's general structure and flow on a higher level.

Aga · on April 6, 2015

Looking at the history tree with all the branches and remote branches is a great way really understand Git.

More powerful history browsers to "lola": * gitk (graphical, developed in git's own repo) https://lostechies.com/joshuaflanagan/2010/09/03/use-gitk-to... * tig (console) http://jonas.nitro.dk/tig/manual.html

naggie · on April 6, 2015

> git rebase -i Very useful!

qzw · on April 6, 2015

Yes, but watch out if you've already pushed to remote branches or even done merge/rebase with other local branches. You're rewriting history.

Zikes · on April 6, 2015

I most often use it because I forgot to include a certain file in my most recent commit. I'll just commit that file then rebase -i to merge the two.

regularjack · on April 6, 2015

If you want to alter the most recent commit, you can 'git commit --amend', no need to rebase.

http://git-scm.com/book/en/v2/Git-Basics-Undoing-Things

kevinmcconnell · on April 6, 2015

For quick changes to the most recent commit, a handy alias to put in your ~/.gitconfig is:

  fixup = commit --amend -C HEAD

That way you can stage whatever you forgot, and then do `git fixup` to update the latest commit with it. Saves a little bit of typing :)

I do love `rebase -i` for anything more complicated though.

fennecfoxen · on April 6, 2015

that flows naturally into git reflog -- to get you out of a botched rebase with git reset :P

rileymat · on April 6, 2015

bisect.

pedrow · on April 7, 2015

I wish Linus would now turn his attention to: 1) make (ought to be a simple concept but never seems like it's quite been done right) 2) init (only he has the clout to solve the pid 1 controversy) Will check back in 2025 for progress report.

stinos · on April 7, 2015

Ok, but only if he first turns his attention to a proper git submodules/multiple reposities kind of thing

misiti3780 · on April 7, 2015

off topic: is the there a third party service that will let me collapse all sub-comments and just read the parent comments?(and then expand them if they seem interesting?)

kbart · on April 7, 2015

There's API for HN (https://github.com/HackerNews/API), it's not that hard to make one yourself. This is Hacker News after all.

misiti3780 · on April 7, 2015

i know - i was hoping someone else did it already! thanks

INTPenis · on April 7, 2015

Linus is not a magician, he is however very good at working with systems and spotting exactly where those systems need improvement.

I respect this quality immensely.

xsace · on April 7, 2015

Now if you can tackle dependency management and build tools Linus, I would appreciate. With love, me.

sytse · on April 6, 2015

There was a nice 10 year overview of git linked from the article https://www.atlassian.com/git/articles/10-years-of-git/ I think Atlassian are great sports for featuring GitLab (that competes with their Bitbucket and Stash products) in there, thanks!

netinstructions · on April 6, 2015

Kind of a nitpick, but that infographic shows that in 2014 Git was used by 33% of developers. Atlassian didn't say which source that came from, but I'm going to guess it came from a 2014 Eclipse Community Survey of 876 respondents[1][2].

I found the poll a bit misleading. There were separate categories for Git and GitHub, but AFAIK you were only supposed to choose one. I'd wager most of the GitHub users are also using Git, so the graph would look more like this[3].

[1] http://www.slideshare.net/IanSkerrett/eclipse-community-surv... [2] http://eclipse.dzone.com/articles/eclipse-community-survey-2... [3] http://i.imgur.com/CEkIHSQ.png

sytse · on April 7, 2015

I agree that using separate categories for git and GitHub doesn't make sense.

jkot · on April 6, 2015

I am sad GIT got widely adopted. GIT-SVN was almost superpower a few years ago. :-)

kinghajj · on April 6, 2015

It still is, depending upon where you work. "I'm having trouble with merging the latest development changes into my feature branch. Tortoise says there are tree conflicts, what--" "Done, I'm committing the result to the SVN repo now." "What?!"