Why Subversion is better than Git
A while back I had to make a decision, as the most senior developer on a team, if we should stick with the Git version control system. Git is extremely popular these days in the open source community. It was developed by Linus Torvalds to manage the Linux source code. And everyone wants to be as cool as Linus.
But hold on. Linux is not your average project. Certainly not the average project I work on. Linux is programmed by thousands of developers all over the world, with a hierarchy of committers and reviewers. Linux has a large volume of code. People using and making modifications to the source code are programmers, who can handle complex concepts who know why merge conflicts occur and when to use feature branches, release branches, etc.
My world looks different. I sit in an office with 2-3 onsite people. Some of them are graphics people who have a good sense of what looks good, but not a good grasp of resolving merge conflicts using vi. Some of them are managers, bosses, project leaders, product creators, who should be enabled to edit certain things such as wording, but who certainly won't make much effort to learn anything new and will simply order me to do their wording changes myself if the system doesn't work to their liking. This may not be everyone's world, but it's my world.
So it would be surprising to find that one system fits all sizes (although certainly not impossible, look at TCP/IP!). But it certainly can't be assumed without evidence or investigation, that a system working for one world is the best solution for the other.
What do I want from a version control system? I want the following, in the following order (most important first):
- A system which can manage files between multiple users. A Windows share would suffice for this requirement. What I don't want is me having to send a file to someone via email, me changing the file, them sending a changed version 2 weeks later back to me, and another 2 weeks after that, and me having to manually work out what changed when, and trying to make a combined version in a text editor. Or me working on my codebase and my colleague working on their codebase, and then 1 month later having to try and merge the results in a text editor.
- History. Who did what when, and why? If something breaks, who did it? Why did they do it? If I revert their change, what will I break? The ability to revert to a previous version.
- Branches. If more than one person creates a new feature, I want to be able to check stuff in, they check stuff out. But it shouldn't go live yet. But live bugfixes need to happen. Multiple people could work on multiple different features, and one merges each feature into the "to release" history stream when they're ready: in whatever order they become ready.
- Being able to create patches of changes, send them to other people, have them review them, etc. Actually I don't need this functionality at all, but maybe I would if I worked on open source software.
OK let's look at objectives #1 and #2, my most important objectives. As stated above, as many people should be using the system as possible (incl. non programming roles), in order to make me send/receive emails of various versions of stuff as infrequently as possible. Here's my experience in this area:
- At Uboot, a long time ago, we used to use CVS. There was only WinCVS at the time, a terrible program by all accounts. My manager Helge simply refused to use it. He said "I don't need to use version control, that's for programmers". Therefore he was not technically able to do any wording, CSS or HTML changes. He had to delegate even the smallest thing to a web-designer or programmer.
- A manager before that, when I suggested web design should be using version control, said "don't even bother trying, web designers are too stupid to use version control". He literally said that. I didn't believe him. Years later at Uboot, the web designers were using CVS, thanks to my campaigning.
- At Uboot, some web designers were better using CVS than others. Some could handle it, others would constantly get confused. Understandably, the error messages were terrible.
- Using Subversion at Easyname (approx 5 people in total), managers, graphics people and software developers are able to work on the code base: anyone can alter wording, HTML, etc. Anyone can fix any bug.
- Using Subversion on another project, my non-programming manager can alter wording in the HTML files I produce. That's significantly better than him sending me Excel files with wording, and me building it in to the code. Not only is my life easier: it much less effort in total, meaning we can offer the projects at a lower cost to our customers, for same about of value for the customer.
- Then, on this project, we switched to Git. The manager could just about handle checking in (although didn't really understand the messages, there was a lot of "delete everything, check out everything again, try again" going on). The graphics department couldn't handle it at all. They'd been working on a new version of the product for months, and had produced an un-checked-in tree of changes, un-merged with what development had been doing. Trying to check it in just resulted in errors such as ! [rejected] with no further info (literally, git is this unusable). Forget about creating branches or higher-level objectives of a VCS, this wasn't even achieving objective #1.
Subversion is good because it's simple and it has a good user-interface. Having a good user-interface is made possible (although in no way guaranteed!) by it being simple. Errors don't happen often (as it's easy for the user to have a model of how it works consistent with how it does actually work) and if they do, errors are clear and understandable, and often include a description of what to do next. Tortoise SVN is integrated with Windows Explorer and works perfectly, I think everyone who's ever used it would agree. (Is there anything equivalent for Git?)
Subversion 1.5 (the latest version) even supports some fancy Git features. You can import all the changes from one branch onto another branch ("rebasing") and it remembers which its imported, so next time you do them, it won't import them again. You can "cherry pick", i.e. decide that a particular revision should be imported from one branch to another, but no revisions before or after it: useful for applying a particular bug fix from development to live, for example, without carrying across any other new features. So there's really nothing Subversion doesn't do, which I would need.
So I took the decision to convert our repository away from the modern shiny "git" system, into the nearly-legacy (if you read open-source evangelists!) well-engineered easy-to-use Subversion. I haven't looked back.
In addition (independent of distributed vs central source repositories), Subversion has so many well-engineered practical features:
- Directories being empty and directories not existing are different states. Once a web designer created a whole structure of empty directories and checked them in to CVS only to see them not be checked out. On Subversion a directory can be empty on one revision, deleted on the next, replaced by a file of the same name on the next commit, etc. Neither CVS nor Git can handle this.
- Want to revert your changes? Just type "svn revert". I have watched a "Git expert" spend 30 minutes reading man pages to find out why the equivalent git command worked on modified files, but didn't restore files which had been deleted. Reverting things is important, objective #2.
- Binary files are automatically detected (in contrast to CVS).
- Executable flag allows files to be marked as needing the UNIX "executable flag" (in contrast to CVS)
- Files can be ignored based on versioned properties (i.e. *.class files, etc.). This is more elegant than creating a special file like ".cvsignore"
- Want to commit with a different user? (e.g. shared check-out like on a live system) Just use the "–username" option or simply press return when prompted for a password and it'll prompt you for a new username.
- Absolute paths. You can execute "svn commit /home/adrian/checkout/file.txt". This is convenient with Mac Finder: you can just drag a file from the finder to the terminal to gets its (absolute) path to appear. Neither CVS nor Git allows absolute paths for its commands.
- The help is clear because the features are so simple. Check out the manual for svn log vs git log (shows the history of a file: how complex can it be!?)
- You can do a quick fix on the live server and commit it. Subversion demands that the modified files be up-to-date before you can commit. Git demands that every file in the checkout is up-to-date, basically meaning you always have to pull the latest version from the repository and install it live, before you can commit the fix.
- Don't have Subversion? Don't even know what it is? Subversion repositories are URLs, so you can just type it into a browser and see the files. CVS and Git repository descriptions are weird strings, so if you don't have the software, you don't know where to start.
- You can force locking. You can specify that certain unmergable files should adhere to "lock, edit, commit & unlock" semantics. I use this for Excel spreadsheets, for example.
- You can embed expandable keywords like $Revision$ into the source and use an option to say they should get replaced to e.g. $Revision: r213$ when a commit is done. This is useful because if you send a file to someone, they edit it and send it back, you need to know which version they originally got in case you've updated the file in the meantime. (One could say they "should do a checkout" but try telling a large many-thousand-person company to install a new version control system just to edit e.g. one wording file or an image, when they don't even have physical access let alone administrative rights to their PC.)
- Those $Revision$ tags don't generate conflicts when merges are done. (CVS also replaces tags, but if you merge two versions together, CVS sees that the tag has been "modified" (by itself) in incompatible ways, and generates a conflict. Subversion normalizes such tags back to $Revision$ before applying merges.)
- You can checkout only a subdirectory of the repository just by adding the directory to the end of the URL.
- Old servers can talk to new clients and vice-versa. This is good software engineering.
- Output is copy-paste friendly. From the "svn diff" command you see your differences. You can copy/paste the filenames easily. In Git, you always have an extra "a/" in front of the name, e.g. "a/dir/myfile.txt". Double-click on it and you can't copy-paste it anywhere without having removing to remove the "a/" first.
- Internally, Subversion separate logic from UI. A library of logic can be used from various different UIs. CVS tools all just wrap the command-line CVS tools (and always have a "command pane" where you can see the input/output to these tools, in case something goes wrong, e.g. you try and check in a file whose name contains a space!) It seems "git" has also been developed around the command-line-tool model. In 2008-11 someone suggested making a UI-independent library of functionality for git.
- Subversion can handle Unicode file names.
- SVN Externals are cool, for example a checkout of an application can check out the libraries as well, even if the libraries are hosted somewhere else. That's better than checking the source of the libraries into the VCS, or the compiled form (e.g. JAR) or having a more complex build procedure
A few positive things to say about Git vs Subversion:
- Git really is unbelievably fast. But is that actually a useful feature? For the Linux kernel, maybe. But I just did an "svn up" on my current project, it took 1.2 seconds (incl. server communication) and it was 1,684 files. On another project (uboot) "svn up" took 7.4 seconds and has 18k files. But I only do "svn up" once a day.
- Grep -v. Because Subverison stores the original versions of all files it checks out, so it can send diffs to the server, if you do a "recursive search" for any string every file shows up double: once in the file, a second time in the Subversion's internal directory. That's quite annoying. Git has "git grep" but even if one uses "grep -v" they don't show up for some reason.
- Scrolling. Git automatically pipes the output of commands to a "more" program
- "svn log" on a directory shows the differences to that directory object (which you rarely want), not all the files underneath it (which is what you always want). "svn info" shows the URL for that directory, "svn log [url]" shows the differences to all files (what you want).
- Subversion renames don't work too well: if you modify a file and someone else renames it you get a conflict, as opposed to your changes getting taken over to the new file name (i.e. contents changes and renaming being changes to two different aspects to a file, so should be mergeable). Does git handle this better? No idea, I didn't understand git's output when I tried to rename a file. ("git mv a b; git status" and "mv a b; git rm a; git add b; git status" produced different results, even though they should be the same according to the docs?)
By the way, when converting between version control systems:
- Do always copy/convert the entire history, don't just check in a "snapshot" of the current versions. Old versions are really amazingly useful. I regularly use "svn blame" on uboot and see changes from 2004 and beyond. Particularly satisfactory/useful if the change occurred at the time I wasn't working for the company :)
- Branches. Make sure to convert not only the "trunk" / "master branch", but also any other branches or tags which are in use.
- Send an email to all developers explaining how to download/use the new system (e.g. Subversion), how to make a checkout (e.g. the URL), and any username/passwords.
- Other checkouts. Don't forget clients of the VCS are not only humans, but all sorts of live servers, integration testing environments, daily build systems, etc. Make sure they get converted too.
- Turn off writes to the old system. Keep the old system there (in case something goes wrong) but make sure it's read-only.
(Subversion 1.5, Git 1.5)