12

I want to compress big text files with python (I am talking about >20Gb files). I am not any how an expert so I tried to gather the info I found and the following seems to work :

import bz2

with open('bigInputfile.txt', 'rb') as input:
    with bz2.BZ2File('bigInputfile.txt.bz2', 'wb', compresslevel = 9) as output:
        while True:
            block = input.read(900000)
                if not block:
                    break
                output.write(block)

input.close()
output.close()

I am wondering if this syntax is correct and if there is a way to optimize it ? I have an impression that I am missing something here.

Many thanks.

user1242959
  • 125
  • 1
  • 4
  • 2
    What is the problem you're having? Is the file you output correct? – Daenyth Mar 01 '12 at 15:15
  • 1
    Why did you choose to read by 900000? – n1r3 Mar 01 '12 at 15:28
  • Yes it seems, uncompressed size corresponds and the format looks ok. I am not confident with everything I code being learning python (and IT in general) by myself. Thanks. Yes I choose thinking about the size of the chunks used by bzip2, I thought better compress one chunk at a time, wrong ? – user1242959 Mar 01 '12 at 15:28

2 Answers2

18

Your script seems correct, but can be abbreviated:

from shutil import copyfileobj

with open('bigInputfile.txt', 'rb') as input:
    with bz2.BZ2File('bigInputfile.txt.bz2', 'wb', compresslevel=9) as output:
        copyfileobj(input, output)
Fred Foo
  • 328,932
  • 68
  • 689
  • 800
  • Thank you ! So you mean the definition of the chunk sizes is not necessary then ? – user1242959 Mar 01 '12 at 16:04
  • 1
    Yep. `copyfileobj` copies in blocks of 16kB by default; you can still set the chunk size if you want by adding a third argument. – Fred Foo Mar 02 '12 at 12:43
  • re: the chunk size: it probably isn't going to make a big difference to either processing time or compression ratio so unless you need to really eke out that last little bit, easiest to just leave it on the default. – fantabolous Aug 08 '14 at 13:32
  • An elegant Pythonic example. – Adam Matan Aug 23 '16 at 14:14
0

Why are you calling the .close() methods? They are not needed as you use the with: statement.

n1r3
  • 7,497
  • 3
  • 16
  • 19