2014-01-10

How to remove almost all files from a Git repository

This blog post explains how to remove all files (including their history) from a Git repository, except for files in a whitelist. This can be useful to split a Git repository to two smaller repositories.

This can lead to a data loss, so make sure you have a backup of the repository. Also read the basics about rewriting history and git filter-branch first.

Here is the command which keeps only the files foo and bar/baz (type it without the leading $):

$ (export KEEP="$(echo 'foo'; echo 'bar/baz')";
  NL="$(echo;echo x)"; export NL="${NL%x}"; git filter-branch -f \
  --index-filter 'X="$IFS"; IFS="$NL";
  set -- $(git ls-files | grep -vFx "$KEEP");
  IFS="$X"; test $# -gt 0 &&
  git rm --cached --ignore-unmatch -- "$@"; :' --prune-empty HEAD)

This needs a Bourne-compatible shell, so it won't work out-of-the-box in the Windows command-line, but it will work on most modern Unix systems.

This looks like unnecessarily complex, elaborate and bloated, but all the little tricks are necessary to make it work with files with funny characters in their name and with all modern Bourne-compatible shells. (Only newline and apostrophe (') won't work.)

To keep empty commits, omit the --ignore-unmatch flag.

Please note that if the files you are interested in were renamed, then this command doesn't recognize old names of the files: you have to enumerate the old pathnames explicitly to keep them.

To do the other way round, i.e. to keep all files except foo and bar/baz, do this:

$ (export KEEP="$(echo 'foo'; echo 'bar/baz')";
  NL="$(echo;echo x)"; export NL="${NL%x}"; git filter-branch -f \
  --index-filter 'X="$IFS"; IFS="$NL"; set -- $KEEP;
  IFS="$X"; test $# -gt 0 &&
  git rm --cached --ignore-unmatch -- "$@"; :' --prune-empty HEAD)

1 comment:

zsbana said...

See also the svndumpfilter utility for modifying a subversion repository in a simliar way: http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.filtering