1

I am running this command from Why is my git repository so big? on a very big git repository as https://github.com/python/cpython git rev-list --all --objects | sed -n $(git rev-list --objects --all | cut -f1 -d' ' | git cat-file --batch-check | grep blob | sort -n -k 3 | tail -n800 | while read hash type size; do size_in_kibibytes=$(echo $size | awk '{ foo = $1 / 1024 ; print foo "KiB" }'); echo -n "-e s/$hash/$size_in_kibibytes/p "; done) | sort -n -k1;

It works fine if I replace tail -n800 by tail -n40:

1160.94KiB Lib/ensurepip/_bundled/pip-8.0.2-py2.py3-none-any.whl
1169.59KiB Lib/ensurepip/_bundled/pip-8.1.1-py2.py3-none-any.whl
1170.86KiB Lib/ensurepip/_bundled/pip-8.1.2-py2.py3-none-any.whl
1225.24KiB Lib/ensurepip/_bundled/pip-9.0.0-py2.py3-none-any.whl
...

I found this question Bash : sed -n arguments saying I could use awk instead of sed.

Do you know how do fix this sed: Argument list too long when tail is -n800 instead of -n40?

user
  • 5,816
  • 7
  • 53
  • 105
  • If you post your data and what you like to get out of it, we may be able to help to shorten this long line of commands. All that are done with `cut`, `grep`, `sed`, `sort`, `tail`, `print` and `awk`, may be done with just one `awk` – Jotne May 31 '19 at 05:17
  • You can see the error just by cloning the cpython repository pointed at the question and running the command on the cloned repository. The question also has a sample output when tail `-n800` is replaced with tail `-n40` – user May 31 '19 at 05:20
  • The reason you get this error is that you're trying to pass to `sed` more arguments than your system supports. – torek May 31 '19 at 08:29

2 Answers2

1

As an alternative, check if git sizer would work on your repository: that would help isolating what takes place in your repository.

If not, you have other commands in "How to find/identify large commits in git history?", which do loop around each objects and avoid the sed -nxx part

The alternative would be to redirect your result/command to a file, then sed on that file, as in here.

VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
1

It seems you have used this anwer in the linked question: Some scripts I use:.... There is a telling comment in that answer:

This function is great, but it's unimaginably slow. It can't even finish on my computer if I remove the 40 line limit. FYI, I just added an answer with a more efficient version of this function. Check it out if you want to use this logic on a big repository, or if you want to see the sizes summed per file or per folder. – piojo Jul 28 '17 at 7:59

And luckily piojo has written another answer addressing this. Just use his code.

A.H.
  • 57,703
  • 14
  • 82
  • 113