How to use git annex on an existing repository


What is the best way to import all big files (or all binary files) into git annex, when they are already in a git repository?

I don't want to lose all of my commits, so I think it's not a good idea to just make a new repo and initialize annex there, importing all of the files and then committing.

I also thought about the following: copy the repository, then delete all binary files in git, and import then again and add to annex. This would be an immense amount of work to do with multiple branches and a lot of binary stuff in there.

Best Answer

If you just remove the files from the most recent commit and start using git-annex now, it will work, but your existing git repository will not get any smaller. This is because your history still contains all the big files checked into Git.

You might be able to use git-filter-branch to rewrite your commits to remove the big files and annex them, as if they had been there all along. That command would probably look something like the following. I haven't tested this myself since I don't have git-annex installed, so you should clone your repo and test it there first!

git filter-branch --tree-filter 'find . -size +5M -type f -not -ipath \*.git/\* -print0 | xargs -0 git rm --cached;find . -size +5M -type f -not -ipath \*.git/\* -print0 | xargs -0 git annex add' HEAD

Step by step, what that hopefully does is:

  1. git filter-branch --tree-filter '<commands>' HEAD

    Rewrite trees for all commits reachable from HEAD.

  2. find . -size +5M -type f -not -ipath \*.git/\* -print0 | xargs -0 git rm --cached;

    For each commit, find all files larger than 5MB in the repo (minus the .git directory) and remove them from the index.

  3. find . -size +5M -type f -not -ipath \*.git/\* -print0 | xargs -0 git annex add

    Find all files larger than 5MB in the repo and add them to the annex

Related Question