No one else has addressed this for a while, so I'll take a stab at it. This is all purely high-level theoretical though, since I have not read the paper on the original patience algorithm.
The LCS (longest common subsequence) algorithms are all about reducing the time spent finding a minimal edit distance solution. The standard (dynamic programming) solution is O(MN) where M is the number of symbols in the original string and N is the number of symbols in the target string. In our case, the "symbols" are lines, and the "string" is the collection of lines, rather than strings with characters (where the symbols would be, e.g., ASCII codes). We simply fill in an M x N matrix of "edit costs"; when we're done, we produce the actual edit by tracing a minimal path backwards through the resulting matrix. See https://jlordiales.me/2014/03/01/dynamic-programming-edit-distance/ for an example. (Web page found via Google search: it's not something I had anything to do with, other than to scan it at high speed for correctness now. It seems correct. :-) )
Actually computing this matrix is fairly expensive for big files, since M and N are the number of source lines (usually approximately equal): a ~4k line file results in ~16M entries in the matrix, which must be filled in completely before we can trace the minimal path back. Moreover, comparing "symbols" is no longer as trivial as comparing characters, since each "symbol" is a complete line. (The usual trick is to hash each line and compare hashes instead during the matrix generation, and then re-check during traceback, replacing "keep unchanged symbol" with "delete original and insert new" if the hash misled us. This works fine even in the presence of hash collisions: we may get a very slightly sub-optimal edit sequence, but it will practically never be awful.)
LCS modifies the matrix-calculation by observing that keeping long common subsequences ("preserve all these lines") almost always results in a big win. Having found some good LCS-es, we break the problem into "edit the non-common prefix, keep the common sequence, and edit the non-common suffix": now we calculate two dynamic programming matrices, but for smaller problems, so it goes faster. (And, of course, we can recurse on the prefix and suffix. If we had a ~4k-line file and we found ~2k totally-unchanged, in-common lines near the middle, leaving ~0.5k lines at the top and ~1.5k at the bottom, we can check for long common sub-sequences in the ~0.5k "top has a difference" lines, and then again in the ~1.5k "bottom has a difference" lines.)
LCS does poorly, and thus results in terrible diffs, when the "common subsequences" are trivial lines like }
, that have lots of matches but are not really relevant. The patience diff variant simply discards these lines from the initial LCS calculation, so that they're not part of a "common subsequence". This makes the remaining matrices larger, which is why you must be patient. :-)
The result is that patience diff is no help here because our problem has nothing to do with common subsequences. In fact, even if we discarded LCSes entirely and just did one big matrix, we'd still get an undesirable result. Our problem is that the cost of deleting:
- * Function foo description.
- */
-function foo() {}
-
-/**
(and inserting nothing) is the same as the cost of deleting:
-/**
- * Function foo description.
- */
-function foo() {}
-
The cost of either one is just "delete 5 symbols". Even if we weight each symbol—make non-empty lines "more expensive" to delete than empty lines—the cost remains the same: we're deleting the same five lines, in the end.
What we need, instead, is some way to weight lines based on "visual clustering": short lines at an edge are cheaper to delete than short lines in the middle. The compaction heuristic added to Git 2.9 attempts to do this after the fact. It's obviously at least slightly flawed (only blank lines count, and they have to actually exist, not just be implied by reaching an edge). It might be better to do the weighting during matrix-filling (assuming what's left, after doing LCS elimination, really is going through the full dynamic programming matrix). This is nontrivial, though.