31

Is there a best practice for the type files stored in Git LFS? Specifically for the minimum size?

For instance, a 10mb music file would be a obvious fit, but what about a 25kb png? Is it worth putting in LFS or it better to just let Git handle it?

My concern is performance degradation when checking too many small files into an LFS repo. Is there any data on how the LFS extension stands up to a bunch of smaller binary files? Is it advisable to only store files over a certain size threshold?

James McMahon
  • 45,276
  • 62
  • 194
  • 274
  • 5
    +1 I too would like to know the answer to this, for example UE4 has many binary uasset files. Many are small (10-100KB) and some are large (50MB+). I'd like to just track "*.uasset" if git-lfs works well enough. – Chad Apr 03 '16 at 16:13
  • the actual size of the png should be compared to the file replaced by lfs. In my case a unix bash script itself is 2KB but if I track it under lfs, it will be replaced by a file with same name and size 129 KB. So I it does not make any sense in this case – Eklavyaa Mar 04 '19 at 08:35

1 Answers1

14

I would not expect a an exact threshold value being given.

LFS saves on the amount of data the needs to be exchanged for for synchronizing with a remote repository. However, the saving only applies as long as the large file itself is not changing. Actually for a changed file you would need a second rountrip to process the change on an LFS object.

So, you may include smaller files with LFS if in your use case those are not changing (frequently). The specific break even will depend on the I/O speed of the server and mostly on the latency and throughput between repository and client.

In your example, I'd still expect improvements in case the pngs are close to never changing. As soon as they are going to change (almost) on each and every commit even larger files might not benefit from being put to LFS.

Also the extra cost of second round trip will become less and less important the larger the typical files will be. Especially when the size of a file class (suffix) will vary over a broad range and/or the change frequency within a file class is covering a wide spectrum there might not be a clear answer to your question.

rpy
  • 3,845
  • 2
  • 16
  • 30
  • 2
    I was under the impression that the benefit of LFS was that often changing binary files do not inflate repo size. But it sounds like you are saying, that it actually doesn't help when files change often; so why use it ever? – Chad Apr 08 '16 at 04:18
  • Should have been more precise. Repo size in the sense of the object blob (pack file) is smaller. I was referring to the amount of data that ned to be transferred betwen client and server (so to say on push and pull) operations. As usually local operations are not of major importance and changes will require to do comparisons anyway, I focussed on major cost aspects with large files. LFS will save on any operation that needs to process the object data. – rpy Apr 08 '16 at 06:20
  • As long as only index/meta data is involved you should not experience difference. With LFS the overall size of all files keeping information (full repository) will not be smaller (even larger to be honest, all versions still need to be stored), but saving on index/meta data will speed up all determination of what objects changed (locally or between local and remore instances). So the later operations get sped up. And this will benefit mostly cases where the LFS stored object is not the one that changed. – rpy Apr 08 '16 at 06:20
  • 2
    Another factor for LFS push / pull speed. Downloads from LFS appear to be synchronous (at least on Gitlab) so downloading one small file after another can be tedious. If they were downloaded in parallel small files download would be considerably faster. – James McMahon Apr 15 '16 at 15:09