libraries to find delta between very large files (hundreds Gb) in a reasonable amount of time and RAM?
try HDiffPatch,it used in 50GB game(not test 100GB) : https://github.com/sisong/HDiffPatch
it can run fast for large file, but is not muti-thread differ;
Creating a patch: hdiffz -s-1k -c-zlib old_path new_path out_delta_file
Applying a patch: hpatchz old_path delta_file out_new_path
diff with -s-1k & input 100GB files, requires ~ 100GB*16/1k < 2GB bytes of memory; if diff with -s-128k then less time & less memory;
bsdiff can changed to muti-thread differ:
- suffix array sort algorithm can replace by msufsort,it's a muti-thread suffix array construction algorithm;
- match func changed to a muti-thread version, clip new file by thread number;
- bzip2 compresser changed to a muti-thread version,such as pbzip2 or lzma2 ...
but this way need very large of memory! (not suitable for large files)