2

I'm trying to implement a three-way diff/merge algorithm (in python) between, say, the base version X and two different derivative versions A and B, and I'm having trouble figuring out how to handle some changes.

I have a line-by-line diff from X to A, and from X to B. These diffs give, for each line, an "opcode" which is either =, if the line didn't change, + if a line was added, - if the line was removed, and c if the line was changed (which is simply a - immediately followed by a +, indicating a line was removed and then replaced, effectively modified).

Now I'm comparing corresponding opcodes from the A-diff and the B-diff, to try to decide how to merge them. Some of these opcode combos are easy: = and = means neither version changed the line, so we keep the original. + and = means that a line was added on one side and no change was made on the other, so accept the addition and advance to the next line only on the side that added the line. And - and c is a conflict that the user must resolve, because one one side a line was changed, on the other side the same line was removed.

However, I'm struggling with what to do with a + and a -, or a + and a c. In the first case for instance, I added a new line on one side, and deleted a subsequent line on the other side. Strictly, I don't think this is a conflict, but what if the addition was relying on that line being there? I guess that applies to the entire thing (something added in one place may rely on something somewhere else to make sense). The second case is similar, I added a line on one side, and on the other side I changed a subsequent line, but the addition may be relying on the original version of the line.

What is the normal approach to handling this?

brianmearns
  • 8,433
  • 6
  • 49
  • 70
  • the normal approach is to diff/merge two files each time :-) – Leo Sep 18 '14 at 18:21
  • @Leo Not sure if you're being sarcastic, but no it's not. Three-way merging comes in when you have an edit conflict: two people making changes to the same file at more or less the same time. If you simply compare the two new versions to each other, you can't tell, for instance, if one person deleted a line or the other person added it. By considering how each one changed relative to a common ancestor version, you can know which side was responsible for the change. – brianmearns Sep 18 '14 at 18:24
  • sorry, my intention was not to be sarcastic. I just remembered how conflicts are hard to solve sometimes just using 2 files at a time (many times needing human intervention), so I just thought this 3-way diff/merge would be even worse. But your explanation makes sense completely. Please apologize my comment. – Leo Sep 18 '14 at 18:26
  • Not a problem, I wasn't trying to be a jerk either, just wasn't sure of your intent. Thanks for clarifying =) – brianmearns Sep 18 '14 at 18:28

1 Answers1

0

The usual robust strategy (diff3, git's resolve ...) is, that changes (+ - c) in one file must be away minimum N (e.g. 3) context lines from competing changes in the other file - unless being exactly equal. Otherwise its a conflict for manual resolution. Similar to the requirement of some clean context in patch application.

Here is an example where sb tries some fancy extra strategies like "# if two Delete actions overlap, take the union of their ranges" to reduce certain conflicts. But thats risky; and rather on the other side there is no guarantee that concurrent changes do not result in problems even when very far away.

kxr
  • 3,041
  • 30
  • 20