51

I've been looking into rebasing with Git over the past couple days. Most of the arguments for rebasing say that it cleans up the history and makes it more linear. If you do plain merges (for example), you get a history that shows when the history diverged and when it was brought back together. As far as I can tell, rebasing removes all that history. Question is this: why wouldn't you want the repo history to reflect all the ways the code developed, including where and how it diverged?

ngephart
  • 663
  • 1
  • 5
  • 8
  • 1
    This is why some people say you shouldn't use `git rebase` very much. http://paul.stadig.name/2010/12/thou-shalt-not-lie-git-rebase-ammend.html might be worthwhile for you to read. – MatrixFrog Mar 10 '11 at 03:00
  • 1
    @MatrixFrog Thanks. I tend to agree with him I think, but I imagine there are also situations where such tools (`git rebase` and friends) are useful. – ngephart Mar 10 '11 at 06:00

5 Answers5

60

Imagine you are working on a Secret Project of World Domination. There are three masterminds on this conspiracy:

  • The Genius
  • The General
  • The Computer Hacker

And they all agree to come to their secret base in 1 week each one with 1 detailed plan.

The computer hacker, being a pragmatic programmer, suggested that they use Git to store all the files of the plans. Each one will fork the initial project repo and they will merge all in one week.

They all agree and in the following days the story goes like this:

The Genius

He made a total of 70 commits, 10 each day.

The General

He spy the repo of their comrades and made an strategy to beat them. He made 3 commits, all the last day.

The Computer Hacker

This pragmatic programmer used branches. He made 4 different plans, each one on a branch. Each branch was rebased to be just one commit.

Seven days passed and the group meet again to merge all the plans into one master piece. All of them were eager to start so all of them tried to merge all the stuff on his own.

Here goes the story:

The Genius

He merged all the changes from the General's repo and then the Computer Hacker's one. And then, being a logic lover, he gave a look at the log. He expected to see a logical evolution of an idea, where the things were constructed upon the previous ideas-commits.

But what the logs shown, was a myriad of commits of different ideas all mixed in the time line. One reader could not really understand the evolution, the reasoning of the commits just by reading the commits time line.

So he ended with a mess, that even a genius could't understand.

The General

The General thought: Divide and conquer!

And so he merged the repo of the Genius on his repo. He looked at the log and saw a bunch of commits from the Genius idea, which followed an understable progression, until the last day. The last day the ideas of the General and the Genius were mixed.

He was spying the The computer Hacker and knew about the Rebase solution. So he did a rebase of the his own idea and try the merge again.

Now the log showed a logical progression every day.

The Computer Hacker

This pragmatic programmer created a integration branch for the Genius idea, another one for the General idea and another one for his own ideas. He did a rebase to each branch. And then he merged all in master.

And all of his team mates saw that his log was great. It was simple. It was understable at first sight.

If an idea introduced a problem, it was clear in which commit was introduced, for there was just one.

They ended conquering all the world and they vanished the use of Subversion.

And all were happy.

Nerian
  • 14,973
  • 12
  • 63
  • 88
  • 1
    "He did a rebase to each branch. And then he merged all in master." - shouldn't he rebase and then merge them one at a time? I mean, after a merge of one branch, a rebase of the other branches would be required to get the simplest history right? – herman Mar 08 '18 at 15:42
  • And along comes the one man General Evil Hacker Genius and plants a back door and rebases on master, so it looks like the change came from one of the other 3 stooges. Nobody notices and they release the code... Unrestricted access to the database. – Piotr Kula Feb 21 '20 at 09:27
31

As far as I can tell, rebasing removes all that history.

That's not correct. Rebasing, as the name suggests, changes the base of commits. Usually no commit is lost in that process (except that you don't get a merge commit). While your argument about keeping really everything of the development process inside the history the way it was done is correct, very often this leads to confusing histories.

Especially when working with others that each work on their own branches while requiring certain changes from others to continue (for example A asks B to implement something so that A can use that feature in his own development), this leads to many merges. For example like this:

     #--#--#--#--*-----*-----------------*---#---\         Branch B
    /           /     /                 /         \
---#-----#-----#-----#-----#-----#-----#-----#-----*       Branch A

In this example we have a branch that works separately for the time but constantly pulls in changes from the original branch (# are original commits, * are merges).

Now if we do a rebase on Branch B before merging in back in, we could get the following:

                             #--#--#--#--#---\         Branch B
                            /                 \
---#---#---#---#---#---#---#---#---------------*       Branch A

This represents the same actual changes, but B was rebased to some older commit on A, so all merges on B that were done before are no longer needed (because those changes are already there in that older commit). And all commits that are missing now, are the merges, which usually do not contain any information about the development process. (Note that in this example you could also rebase that last commit on A later on, to get a straight line, effectively removing any hints to the second branch)

poke
  • 307,619
  • 61
  • 472
  • 533
  • 3
    So, some data *is* lost, but only stuff that isn't really relevant to the development? There's no reason to see those extra merges, because they don't contribute any meaningful information. Is that right? – ngephart Mar 09 '11 at 21:44
  • 2
    Yes exactly, the only information they basically provide is what branches it was based on. And when rebasing, you put that information indirectly into other commits. Apart from that, the commits are just adjusted, but neither deleted nor changed (in content). – poke Mar 09 '11 at 21:53
  • It does update the date of the original commit, and that may be of some value to some people. If people have been committing changes to a branch for some time, after rebasing, all the commits are going to have the rebase date. – Sergio Pulgarin Dec 12 '15 at 03:26
  • 2
    @SergioPulgarin i've never seen that behaviour. the authoring dates are maintained, which are the ones that matter. maybe you're talking about the commit dates? – Hilikus Dec 15 '15 at 15:05
  • 1
    I can't say this answer, although accepted, is entirely correct: with the merge commits in the history you also have a trace of the conflict resolution: for each conflict you can check what the two previous versions were and how it was resolved in the merged version. That information is lost when rebasing. Nevertheless I usually prefer rebase as well for the cleaner history. – herman Mar 08 '18 at 11:26
  • 1
    @herman The conflict resolution is also there with rebasing. It’s just integrated into the commits that originally introduce the change, making it semantically better than having a single merge commit that solves conflicts for a whole set of (possibly unrelated) commits. And what you are losing is the conflict resolution against an older change, which is not really that useful in my opinion. – Anyway, I *do* state in my answer that there is some information loss when rebasing. – poke Mar 08 '18 at 11:43
  • 1
    @poke not true. After rebasing there is no trace that a conflict ever existed. Whether it's useful info depends on the case (and indeed often it is not). – herman Mar 08 '18 at 12:55
  • @herman I mean that by resolving conflicts while rebasing, you are integrating the conflict resolution within the same commit that introduces the change. Of course you won’t see that a conflict ever existed (like you won’t see that there was a branch) but you will still see how the change is integrated. – poke Mar 08 '18 at 13:12
  • 1
    @poke I see what you mean, but that is not what I mean by being able to trace conflict resolution. If the conflict was resolved incorrectly, you can't determine what the intent was in the original commit that got rebased. – herman Mar 08 '18 at 15:33
13

You do a rebase mainly to rework your local commits (the one you haven't pushed yet) on top of a remote branch (you just fetch), in order to solve any conflict locally (i.e. before you push them back to the upstream repo).
See "git workflow and rebase vs merge questions" and, quite detailed: "git rebase vs git merge" .

But rebase isn't limited to that scenario, and combined with "--interactive", it allows for some local re-ordering and cleaning of your history. See also "Trimming GIT Checkins/Squashing GIT History".

why wouldn't you want the repo history to reflect all the ways the code developed, including where and how it diverged

  • In a centralized VCS, it is important to never lose the history, and it should indeed reflect "all the ways the code developed".
  • In a distributed VCS, where you can do all kind of local experiments before publishing some of your branches to upstream, it makes less sense to keep everything within the history: not everyone needs to clone and see all of your branches, tests, alternatives, and so on.
Community
  • 1
  • 1
VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • 1
    Solving conflicts can be done by you locally by merging in upstream too (usually easier cause you solve th conflict for all commits, not one and one), so you do the rebase mainly to make the history cleaner as you mention later. – Marius K May 13 '11 at 14:32
  • @Marius: merging in upstream locally? But you don't always have local access to the upstream repo... – VonC May 13 '11 at 14:44
  • 1
    No, but if you want to push/request pull, then you can't be offline. And rebasing a lot of commits giving conflicts might be hard (and you need to remember what the repo should look like after that particular commit), so it might be easier to merge in upstream or squash the commits and then rebase. – Marius K May 30 '11 at 10:58
0

Organizing your history is the point of using rebase over merge, and it's extremely valuable.

What use is a git history which accurately reflects every code change of the past? Do you need such a thing for some kind of certification effort? If you don't, why do you want that? The past as it really happened is messy and difficult to understand. I mean, why not also include every character which you wrote then deleted while editing the file?

The most common way you'll use your git history is reading it. Finding which commit caused an issue and exploring the different versions of a file are probably the two most common use cases. Both these use cases become much simpler and convenient when your git history is straight (and clean!).

Perhaps even more importantly than using rebase to share changes with the rest of the team, each member should use rebase to format their changes into a logical collection of self-contained commits. Development doesn't naturally occur in logical steps that directly follow each other. Sometimes you just push a commit on your branch just because it's the end of the day and you have to go. Putting this kind of information in your git history is pure noise. I routinely squash a feature that took 20 commits down to just one or two, because there's just no point showing anything which didn't end up being part of the finished product.

Even if the development history of your feature was an unholy mess, you can and absolutely should craft an utopic git history. You get everything right the first time in the correct order, you did feature A on day 1 and feature B on day 2, there were no bugs or temporary print statements. Why should you do that? Because it's easier to understand for someone reading your changes.

If you combine this idea with git bisect then curating your master history to only contain commits which pass all the tests defined at the time becomes even more helpful. It will be trivial to find the origin point of a bug, as git bisect will just work. If you use merge and upload the entire development history of each of your branches to master, there is no chance of bisect being actually helpful.

Kafein
  • 227
  • 1
  • 5
0

If you make a mistake on a public repository and no one has forked/merged/pulled from it yet, you can save face and confusion:

git reset --hard [SHAnumber]

git rebase -f master

git push -f origin HEAD:master

To empty the trash:

git gc
mda
  • 1,048
  • 13
  • 18