1

I'm building a software patch using bsdiff.exe and applying it with bspatch.exe and have so far had no trouble with files smaller than 120MB. One binary file I have was previously 21MB and is now 77MB, and bsdiff seems to hang indefinitely on it.

According to the documentation, "bsdiff is quite memory-hungry. It requires max(17*n,9*n+m)+O(1) bytes of memory, where n is the size of the old file and m is the size of the new file." This explains the problem with large files, but the issue seems to occur when the delta is larger.

Does anyone have any information regarding this? Anything would be helpful, thanks!

orkutWasNotSoBad
  • 640
  • 6
  • 12

2 Answers2

5

I also had a problem with bsdiff crashing when trying to process a file containing just 2MB of DSP executable code.

After some debugging I determined that the issue lies within the qsufsort function which is used to create a suffix array based on the "old" file. qsufsort calls a function called split which calls itself recursively. In the crash case the recursive call happens so many times that the program runs out of stack space and throws an exception.

As suggested by this thread: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=409664 The solution is to replace qsufsort with a different solution to produce a suffix array. The Wikipedia entry for suffix arrays references SA-IS and so I downloaded the source from here: https://sites.google.com/site/yuta256/sais

I then rebuilt bsdiff.c together with sais.c and sais.h and replaced the call to qsufsort with :

I[0] = oldsize; sais(old, I+1, oldsize);

Now bsdiff works every time and it's quicker too!

  • I had bsdiff crash too in that function. Swapped with sais as you said, and seems to work great! I'm not too familiar with the algorithms; I just hope your substitution of the call is right. – Zorg Apr 05 '15 at 19:58
1

Try one of the other binary diffing programs listed here:

https://stackoverflow.com/questions/688504/binary-diff-tool-for-very-large-files

The differences between the two files require memory above and beyond the memory required to represent both files. So processing two binary files with many differences will require more memory than two identical files.

It has trouble with the smaller file because there is a bug in the software. Colin Percival, the guy who wrote it has acknowledged the bug and said he doesn't have time to fix it.

https://www.daemonology.net/bsdiff/

cgmb
  • 3,874
  • 3
  • 30
  • 59
Eric Leschinski
  • 123,728
  • 82
  • 382
  • 321
  • Can you point to the bug? I'd like to investigate if it is fixed in this fork - https://github.com/mendsley/bsdiff – anatoly techtonik Oct 19 '12 at 14:13
  • I don't know exactly where the bug is but here is where Colin says it could be fixed algorithmically: http://debian.2.n7.nabble.com/Bug-409664-bsdiff-is-extremely-slow-on-some-files-sometimes-hangs-td1738215.html – Eric Leschinski Oct 19 '12 at 14:19