3

My requirement: Break down one git repo, into multiple git repos, preserving the same directory structure as in the original repo, and preserve the commit history for the files that are copied to the new repo. What I have tried already:

  1. First I tried git filter-branch --subdirectory-filter based on the suggestions in http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/ Result: The history is maintained, but can be viewed only on running git log --follow Also, the original commit history cannot be seen on Github. It just displays my merge commit as the only commit for that file, and does not display any previous commits. I can still live with this limitation and accept it as a solution. But another concern I have with this approach is that, for each folder and each file that I want to copy, I need to clone the original repo multiple times and also repeat all those 12 or 13 steps everytime. I would like to know if there is any simpler way of doing it, since I'm moving a lot of files around. Also, since the post is 5 years old, just wondering if newer easier solutions are available? (Surprisingly Google mostly shows this blog as the first search result)

  2. Next thing I tried was a comment on the earlier Greg Bayer's post http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/#comment-2685894846 This solution made things a bit simpler by using git subtree split but the results were same as listed in the first case.

  3. Then I tried the git log --patch-with-stat and git am option based on this answer https://stackoverflow.com/a/11426261/5497551 Result: This usually gives errors on encountering a merge, while applying the patch. I tried one of the suggestions to this answer of using -m --first-parent This resolved the errors but does not expand any merges into their commits, just lists the merge as a single commit. Hence most of the commit history is lost. So I added another option of --3way. This went over and over through the commits, and did not lead to any acceptable solution.

In conclusion, I would prefer using the 3rd solution, if only there was an option to have all the commits in a merge to be listed in the history of the new repo. Else I have to stick to the first solution which is a bit inconvenient and tedious in my situation. Any advice, help would be greatly appreciated.

Thanks.

Community
  • 1
  • 1
neel
  • 371
  • 3
  • 11
  • `and also preserve the commit history for all files` - so just delete the bits you don't want to keep? You may find this useful: [New repo with copied history of only currently tracked files](http://stackoverflow.com/a/17909526/761202). – AD7six Jul 04 '16 at 10:28
  • Thanks. When you say `delete everything and just restore the files you want to keep:` can you please help me understand in which step we are restoring the files that I want to keep? Because keep-these.txt will have list of all the files that are present in the current repo right? Or am I supposed to delete the unwanted files first and then do `git ls-files > keep-these.txt` ? I am pretty new to git, so not very well versed with all its concepts. – neel Jul 05 '16 at 04:08
  • Please edit your question to be clear. it is unclear from the question description why `git rm somefolder; git commit -m "deleting somefolder"` doesn't do what you want - you specifically ask to preserve the commit history for _all_ files; also be specific with what you have, and what you expect as a result; put the top level folders/files in the question - and what you want to achieve. – AD7six Jul 05 '16 at 07:24
  • @AD7six First of all apologies for replying so late. Your answer helped me to achieve what I wanted, and hence I had to start working on it immediately, as I had a release yesterday. Now from one single repo that I had earlier, I have created four new repos, and all of them have the git history for their own files. I know the earlier question did not clarify this last part `git history for their own files` hence have updated it accordingly. – neel Jul 12 '16 at 06:43
  • But probably I dint give much thought to it in the first place, since then, I was more concerned about **having** history for files that I'm moving more than **not having** history for files that I'm not moving. But your answer helped me to get both these things done and hence i'm happier now :) ` why git rm somefolder; git commit -m "deleting somefolder" doesn't do what you want` As I said Im new to git hence I wasn't aware that you could clone a git repo & simply convert it to a new repo just by removing and adding the remote origin. None of the posts or forums suggested this earlier. – neel Jul 12 '16 at 06:53
  • Please write an answer to your question and accept it. – AD7six Jul 12 '16 at 08:29

3 Answers3

4

Here is what worked for me(combining answers from @AD7six and @Olivier) to split my orig-repo into multiple new repos. I'm listing here steps for creating only one new repo new-repo1. But same have been used to create the others as well.

First create new empty repo on Github with the name new-repo1

git clone [Github url of orig-repo]

git clone --no-hardlinks orig-repo new-repo1
cd new-repo1
git remote rm origin
git checkout -b master  //This step can be skipped. I had to do it since the default branch on my orig-repo was `develop`, but on the new-repo1 I wanted to create it as `master`

//I used a script here to delete files and directories not required in the new-repo1. 
//But if you have very few files/dirs to be deleted then you can do the below.
git rm <path of file 1 to be deleted>   
git rm <path of file 2 to be deleted>
git rm -rf <path of dir 1 to be deleted>

git commit -m "Deleted non-new-repo1 code"

git ls-files > keep-these.txt
git filter-branch --force --index-filter "git rm  --ignore-unmatch --cached -qr . ; cat $PWD/keep-these.txt | xargs git reset -q \$GIT_COMMIT --" --prune-empty --tag-name-filter cat -- --all

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now

git init
git remote add origin [Github url of new-repo1]
git push -u origin master

After this, I can view history of files in the new-repo1 on Github as well as through command line using git log

neel
  • 371
  • 3
  • 11
1

With method 1, do you clone from a local directory or a URL? If you clone from a local directory, you should use the --no-hardlinks option. Otherwise, what you do in one clone might affect the .git directories of the other ones, because git hard-linked files.

Here’s how I do it:

  • Clone the local repository:

    git clone --no-hardlinks source_repo detached_repo
    
  • In detached_repo, remove the origin (more information here to preserve branches other than the current one):

    git remote rm origin
    
  • Remove tags you don’t want to keep. To remove all tags, use git tag -l | xargs git tag -d

  • Use filter-branch to exclude the other files, so they can be pruned. Let's also add --tag-name-filter cat --prune-empty to remove empty commits and to rewrite tags (more information here if you have several branches to keep):

    git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter folder/to/keep HEAD
    
  • Then delete the backup reflogs so the space can be truly reclaimed (now the operation is destructive):

    git reset --hard
    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
    git reflog expire --expire=now --all
    git gc --aggressive --prune=now
    

    and now you have a local git repository of the folder/to/keep sub-directory with all its history preserved.

EDIT

Since you need to keep more than one subdirectory, I will assume that you have a list of files to keep in a file called files_to_keep. Then change the git filter-branch step to:

git filter-branch --tag-name-filter cat --prune-empty \
    --index-filter 'git ls-tree -z -r --name-only --full-tree $GIT_COMMIT \
    | grep -z -v -F -f /absolute/path/to/files_to_keep \
    | xargs -0 -r git rm --cached -r' HEAD

You can generate the list of files to keep by running this command:

git log --pretty=format: --name-status | cut -f2- | sort -u > all_files

and removing the files you don’t want to keep.

Community
  • 1
  • 1
  • Thanks for your answer. But my requirement is to keep multiple folders and files in the new repo and not just a single `folder/to/keep`. Can your solution be modified to do that? – neel Jul 05 '16 at 03:45
  • Also, in the end how can I create a new Github repo from the `detached_repo` ? Will using the `--no-hardlinks` option and the cleanup steps make me able to do a `git init` on the `detached_repo` and push it as a new repo on Github? Also I want to clarify that these steps will not modify the original repo in any way, that is its files and history remain unaffected ? – neel Jul 05 '16 at 03:53
  • Ok, I got answer to my second comment. I did a `git init` then `remote add origin ` `git push -u origin develop`. – neel Jul 05 '16 at 06:30
  • I edited my answer with a method that supports more than one subdirectory. – Olivier 'Ölbaum' Scherler Jul 05 '16 at 09:53
  • thanks for your update. But I cudnt try your answer, as I realised in my case, it was just enough and much easier to clone the repo and delete the unwanted dirs and files. And then to remove the unwanted history, I used AD7six 's answere [here](http://stackoverflow.com/questions/17901588/new-repo-with-copied-history-of-only-currently-tracked-files/17909526#17909526). – neel Jul 12 '16 at 07:09
  • This works beautifully as needed. Been going nuts over this for more than 3 weeks. I stopped at `--prune=now`. Later I added remote to the origin, executed standard `git push origin`. Thanks @Olivier'Ölbaum'Scherler – Death Metal Aug 25 '20 at 14:16
0

For such a scenario one might want to give git-import a try.

It basically creates patches from the given file or directory ($object) of one repo and applies them to another while keeping history.

cd old_repo
git format-patch --thread -o "$temp" --root -- "$object"

These patches then get applied to a new repository:

cd new_repo
git am "$temp"/*.patch 

(This procedure can be repeated for different parts of the old repository if needed.)

For details please look up:

ViToni
  • 803
  • 8
  • 12