3

I have some svg files that I want to be tracked by git. However, most software can transparently deal with svgz (which is basically svg.gz). Therefore, I was considering switching to svgz to save disk space.

What are the pros and cons of having them as svgz instead of svg from a git perspective?

My naive idea is that the git diff algorithms are optimized for text files and would not work equally well on their compressed counterpart. Since the diffs are also compressed, I assume that the overall approach is very efficient for text files, where the differences take up very little disk space. Instead, for the compressed data, it would tend to save larger files internally, and eventually I would expect that at some point the repository may end up taking up more space for compressed files.

norok2
  • 18,523
  • 3
  • 47
  • 78

1 Answers1

5

Git uses a variant of Xdelta within pack files. This is separate from textual diffs (for which Git uses a variant of XDiff). All of Git's stored objects are also compressed with Git's zlib deflate, which is fundamentally similar to gzip, so at this particular level, it's pretty much a wash.1

If your images are enormous, or you tag them as "do not attempt to delta-compress" via .gitattributes with -delta, you might want to pre-compress them, i.e., store them as svgz files, just for speed.2 If they are smaller, and if some svg file will delta-compress nicely against some other svg file, you would generally want to avoid pre-compressing them as that will make it impossible for Git to delta-compress them.

See also Is repacking a repository useful for large binaries? and Are Git's pack files deltas rather than snapshots? There is a bit more in my answer to What does git do when we do : git gc - git prune. The bottom line, as it were, is that you will have to do some test trials to see what works best for your particular situation. Note that fetch and push use "thin packs" (in which objects are compressed against base objects known to exist in the other Git, but left out of the pack file) to speed network transmission, and this could be more important than saving disk space, so consider that as well.


1Or six of one, half a dozen of the other.

2Even if the time it takes for Git and other tools to compress them is the same, marking them -delta will keep Git from considering them for packing, which will save git repack time.

torek
  • 330,127
  • 43
  • 437
  • 552