Best practices for multiple git repositories

Question

I have around 20 different repositories. Many are independent and compile as libraries but some others have dependencies among them. Dependency resolution and branching is complicated.

Suppose that I have a super project that only aggregates all other repositories. It is used exclusively to run tests -- no real development goes here.

/superproject  [master, HEAD]
    /a         [master, HEAD]
    /b         [master, HEAD]
    /c         [master, HEAD]
    /...

Now, to develop specific features or fixes for each one (a), especially one of those that require specific versions of projects to compile or run (b v2.0 and c 3.0) I have to create a new branch:

/superproject  [branch-a, HEAD]  <-- branch for 'a' project
    /a         [master]  <-- new commits here
    /b         [v2.0]
    /c         [v3.0]

For b, it might be required something else, like a v0.9 and c v3.1:

/superproject  [branch-b, HEAD]  <-- branch for 'b' project
    /a         [v0.9]   <-- older version than 'a'
    /b         [master] <-- new commits go here
    /c         [v3.1]   <-- newer version than 'a'

This becomes even more complex and complicated when implementing common git workflows involving feature branches, hotfix branches, release branches, etc. I was advised to (and advised against) using git-submodules, git-subtree, google's git-repo, git-slave, etc.

How can I manage continuous integration for such a complex project?

EDIT

The real question is how to run tests without having to mock all other dependent projects? Especially when all projects might use different versions. Trigger Jenkins tests after commits in git submodules

I would actually discourage such an architecture. Having a repository like that would only confuse maintainers, and the testing really should be done per project. — Makoto, Jul 03 '15 at 18:14
The individual folders above are different .git repositories, rather than being one single big one - otherwise they couldn't have different branches and tags. — AlBlue, Jul 03 '15 at 19:24
The real question is how to run tests without having to mock all other dependent projects? Especially when all projects might use different versions. — betodelrio, Jul 03 '15 at 19:52

score 6 · Accepted Answer · answered Jul 03 '15 at 22:50

For working with multiple branches in parallel, use paralleled clones if possible. cd is an awful lot easier than checkout and clean and check-for-stale-detritus and recreate-caches every time you want to switch.

So far as recording your test environments goes, what you're describing is exactly what submodules do, in every detail. For something this simple, I'm going to recommend setting yourself up without using the submodule command at all, and telling it about your setup once you're comfortable and the top item on your submodule-issues list is keystroke count.

Starting from the setup in your question, here's how you set yourself up to record clean builds in the subprojects:

cd $superproject
git init .
git add a b c etc
git commit -m "recording test state for $thistest"

That's it. You've committed a list of commit id's, i.e. the id's of the currently-checked-out commits in each of those repos. The actual content is in those repos, not this one, but that's the entire difference between files and submodules so far as git's concerned. The .gitmodules file has random notes to help cloners, mainly a suggested repo that's supposed to contain the necessary commits, and random notes for command defaults, but what it's doing is easy and obvious.

Want to check out the right commit at path foo?

(commit=`git rev-parse :foo`; cd foo; git checkout $commit)

The rev-parse fetches the content id for foo from the index, the cd and checkout do that.

Here's how you find all your submodules and what should be checked out there to recreate the staged aka indexed environment:

git ls-files -s | grep ^16

Check what your current index lists for a submodule and what's actually checked out there:

echo $(git rev-parse :$submodule; (cd $submodule; git rev-parse HEAD))

and there you go. Check out the right commits in all your submodules?

git ls-files -s | grep ^16 | while read mode commit stage path; do
        (cd "$path"; git checkout $commit)
done

Sometimes you're carrying local patches you want applied to every checkout:

git ls-files -s | grep ^16 | while read mode commit stage path; do
        (cd $path; git rebase $commit)
done

and so forth. There's git submodule commands for these, but they're not doing anything you don't see above. Same for all the rest, you can translate everything they do into near-oneliners like the ones above.

There's nothing mysterious about submodules.

Continuous integration is generally done with any of a whole lot of tools, I'll leave that for someone else to address.

Great answer. How to keep track of version dependencies without recurring to **superproject** submodules? Project `a` requires `b` in a specific version. Can I set that contraint as part of `a` itself? — betodelrio, Jul 06 '15 at 16:24
Ah, okay, I'm getting my kid off to camp and was a bit hasty w/ prev comment. Yes, you can do your dependencies that way -- for that, there's no submodule gadget, but again it's a near oneliner -- to add a submodule that actually resides elsewhere, you go to the elsewhere and say `git config worktree ..` to nail down the worktree location and in worktrees using it rather than your initial submodule update you do `echo gitdir: /path/to/projecta/.git >$theprojectbpath_in_a`. Play around with that, I'll have more time maybe tomorrow. — jthill, Jul 06 '15 at 20:16

score 3 · Answer 2 · answered Jul 04 '15 at 00:23

As the author, git slave could work in this situation. How to use it would depend on whether you had control over repos a b and c; by which I mean you could cause the branch strategy to be synchronized between them so that the v2 branch meant the same thing for everyone. If this is true, I would strongly urge git slave since you can essentially treat it as one large project.

If you could not mandate a common branch and tag strategy then you would impose one, which is getting more towards a lightweight version of the workflow that jthill suggested with git submodules. Specifically, you could have your own repo tracking a b and c and create a branch a branch in each one, which would correspond to whatever the correct branches for each slave repo is. Like git submodules you would have to manually bring each repo up to date (merge in this case). However, you would not need to do the mother-may-I step of making the commit in the superproject. Using this technique is not the slam-dunk use case of having the slave projects share the same branch name when they do their own development, but it will work.

As jthill said, continuous integration is pretty much orthoginal to the question of how to wrangle the projects.

No, I cannot mandate a common branch/tag in submodules (slave repos). `gits` appears to be more complex than what I am trying to achieve but will give it a try. Thank you :D — betodelrio, Jul 06 '15 at 16:52

Best practices for multiple git repositories

2 Answers2

Linked