3

Folks, I am generating an md5sum of a gzip file. Technically, each time its compressing the same file, but the resulting md5sum is different. How do I tell it to use the -n flag to omit the original filename and timestamp?

f_in = open(tmpFile, 'rb')
f_out = gzip.open(uploadFile, 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

Thanks!

Cmag
  • 12,570
  • 23
  • 75
  • 129
  • 1
    How do you calculate the `md5` value? Do you invoke `md5` or `md5sum` via shell? This might be related: http://stackoverflow.com/questions/1131220/get-md5-hash-of-big-files-in-python – 9000 Sep 08 '14 at 16:12
  • I calculate like this: `md5hash = hashlib.md5(open(file).read()).hexdigest()` – Cmag Sep 08 '14 at 17:02
  • The value of `open(file).read()` most definitely only depends on file's contents and not name. Are you sure you upload files with the same content? same EOL characters? Also, try `open(file, 'rb')` for binary mode. – 9000 Sep 08 '14 at 17:11
  • See https://stackoverflow.com/a/14004771/1180620 – Mark Adler Oct 20 '17 at 06:32

2 Answers2

5

The GzipFile class allows you to explicitly provide the filename and the timestamp for the header.

E.g.:

#!/usr/bin/python
import sys
import gzip

f = open('out.gz', 'wb')
gz = gzip.GzipFile('', 'wb', 9, f, 0.)
gz.write(str.encode('this is a test'))
gz.close()
f.close()

This will produce a gzip header with no filename and a modification time of zero, meaning no modification time per the RFC 1952 standard for gzip.

Mark Adler
  • 79,438
  • 12
  • 96
  • 137
  • im not sure how to have it be blank... and where to insert it in the code snip above – Cmag Sep 08 '14 at 17:47
  • Im not sure how to translate ` if discernible; otherwise, it defaults to the empty string, and in this case the original filename is not included in the header` from the docs... https://docs.python.org/2.6/library/gzip.html – Cmag Sep 08 '14 at 18:07
  • 2
    If you provide the empty string as the filename, then the header will not include a filename at all. As opposed to having a filename in the header that is an empty string. Subtle difference. – Mark Adler Sep 08 '14 at 18:38
  • does not work for me - gz still changes every run. at least the time seams to be now() – A. Binzxxxxxx Mar 10 '17 at 15:12
  • Just tried it again, and made the code portable to python 2 and python 3. Still writes the exact same gzip file every time, with a timestamp of zero. Not a timestamp of now. Copy exactly the code from my answer and test it. – Mark Adler Mar 10 '17 at 16:25
0

If you would like to write utf-8 text to a gz file without a filename in the header, here's a way to do this:

import gzip, io

ofile = open("./stuff.txt.gz", 'wb')
ogzfile = gzip.GzipFile('', 'w', 9, ofile, 0.)
ogztextfile = io.TextIOWrapper(ogzfile, 'utf-8')

ogztextfile.write("Зарегистрируйтесь сейчас на\nДесятую Международную\nКонференцию")

ogztextfile.close()
ogzfile.close()
ofile.close()
JustAC0der
  • 2,074
  • 3
  • 24
  • 33