How to display file sizes using milhar separation on this git script?

Question

On the question Why is my git repository so big?, is found this script to list the repository big files:

git rev-list --all --objects | \
    sed -n $(git rev-list --objects --all | \
    cut -f1 -d' ' | \
    git cat-file --batch-check | \
    grep blob | \
    sort -n -k 3 | \
    tail -n40 | \
    while read hash type size; do 
         echo -n "-e s/$hash/$size/p ";
    done) | \
    sort -n -k1

But is outputs the file sizes on a bad readable way as:

89076 images/screenshots/properties.png
103472 images/screenshots/signals.png
9434202 video/parasite-intro.avi

I would like it to display the sizes within a more reasonable way as:

   89.076 KB - images/screenshots/properties.png
  103.472 KB - images/screenshots/signals.png
9,434.202 KB - video/parasite-intro.avi

For example, this last one 9,434.202 KB should mean 9.434 MB or 0.9434 GB. But I am not sure whether is the best to use 9.434,202 KB, i.e., just replacing the comma with the dot and vice-versa.

Initially to do it I could think of generating the whole list and afterwards process it. But I think would already be nice to this while the list is being generated. Therefore would not be possible to predict the right side justification, however would already be fine to print the list like this below, without the right side justification:

89.076 KB - images/screenshots/properties.png
103.472 KB - images/screenshots/signals.png
9,434.202 KB - video/parasite-intro.avi

I think the printing is being performed by this line:

echo -n "-e s/$hash/$size/p ";

However I do not understand how to format the $size parameter.

score 1 · Accepted Answer · answered May 02 '17 at 02:37

It sounds like you want to convert the bytes to kibibytes (1024-based kilobytes). Here's an awk hack for that that I stole from here.

size_in_kibibytes=$(echo $size | awk '{ foo = $1 / 1024 ; print foo "KiB" }')

In context:

git rev-list --all --objects | \
    sed -n $(git rev-list --objects --all | \
    cut -f1 -d' ' | \
    git cat-file --batch-check | \
    grep blob | \
    sort -n -k 3 | \
    tail -n40 | \
    while read hash type size; do
         size_in_kibibytes=$(echo $size | awk '{ foo = $1 / 1024 ; print foo "KiB" }')
         echo -n "-e s/$hash/$size_in_kibibytes/p ";
    done) | \
    sort -n -k1

I'm sure you could layer other hacks on top of that one to add commas, add a space before the KiB, play with the justification, etc. Hopefully this gets you started.

How to display file sizes using milhar separation on this git script?

1 Answers1