Donnerstag, 16. Januar 2014

Extract one folder from a Subversion repository into a GIT repository including complete history

This task is so common that there are a lot of articles and howtos on the net. This is mere a collection of information snippets and links to further reading. This article focuses on extracting a specific folder from a Subversion repository. This requirement seems to be untypical. All references I found assume the Subversion repository contains only one projects (including its branches and tags). But if in your case you placed all your projects in one Subversion repository, this article will tell you how you detach one.

Subversion to GIT conversion is done by the git svn command and it's sub commands. Pass the repository folder to git svn clone, and the command will clone the complete repository and complete history from revision 1. If you pass the full path to your project to the command (it doesn't matter if you pass the path as part of the repository path 'git svn clone http://svnserver/repo/trunk/myproject' or as trunk parameter 'git svn clone http://svnserver/repo -T trunk/myproject') it will clone the history from the revision on when your project folder was created. It doesn't care if your project contains files which are older than the project folder. For example: you restructured your project; once you had three modules:

  • "module1"
  • "module2"
  • "module3"

Then you created a new project folder "new project" and moved the modules into this folder:

  • "new project/module1"
  • "new project/module2"
  • "new project/module3"

Using the command above will ignore the history of the modules which is older than the "new project" folder. Even using --follow-parent will not get you the history, since "new project" has no parent, it was newly created. Sad.

Now let's check out how we get the complete history. I found two ways:

  • remove everything uninteresting from Subversion repository, and purge it from history
  • clone completely to GIT and remove/purge the uninteresting parts in GIT
To accomplish the first approach you'd need to use svnadmin dump (since there is no way to delete revisions from Subversion) to rebuild a repository while filtering the uninteresting files. I choose to use the second approach. And here's how you do it.

Clone Subversion Repository
From Git and Other Systems - Migrating to Git:
git svn clone http://svnserver/repo/ --authors-file=users.txt --no-metadata my_project
If anything goes wrong during cloning (for example your network connection dies, or you see a "unknown user" message) fix the problem and continue the process by invoking
git svn fetch
Remove Uninteresting Parts from GIT Repository
After successfully cloning the svn repository, we'd like to shrink down the repository to the project we'd like to keep. Before we do this, we need to convert the repository to a bare git repository. Bare git repositories are the only ones which are allowed to be pushed to. From my_project/.git move everything one level higher, so .git subdirectory ends up empty. Delete the empty .git directory. Now edit my_project/config and change "bare" setting to "true". After completing this you'll have a bare repository and we can continue to filter it.
From git: shrinking Subversion import:
git filter-branch -d /dev/shm/scratch --index-filter \
  "git rm -r --cached -f --ignore-unmatch 'Old Project'; \
   git rm -r --cached -f --ignore-unmatch 'Some Temp Stuff'; \
   git rm -r --cached -f --ignore-unmatch 'One More Project'" \
  --prune-empty --tag-name-filter cat -- --all
(choose a different temp folder for the -d parameter when not running on a up to date Linux)
git rm expects top level names. So the command above will remove the folders "/Old Project/", "/Some Temp Stuff/", "/One More Porject/" and everything that's below. You can specify folders on a deeper level like this:
git rm -r --cached -f --ignore-unmatch 'trunk/Delete-This-Project'
This syntax to git rm does no substring matching (folder names have to be complete). After this run the GIT repository will only contain the project you're interested in, including the whole history (assuming the history of your project does not intersect with folders you deleted).

Reduce Size of GIT Repository
GIT tries really hard not to delete stuff you might still need. So filter-branch will not reduce the size of your repository on disk. To remove all the deleted bits and bytes you also have to (copied from Detach subdirectory into separate Git repository):
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --aggressive --prune=now

That's it. If everything went fine your GIT repository will contain only the stuff you're interested in.

Keine Kommentare:

Kommentar veröffentlichen