How to skip some chunks in python read file code?

Question

I have a code like this:

chunk_size=512*1024 #512 kb
big_file = open(file, 'rb')
while True:
        data = big_file .read(chunk_size)
        if not data:
            break

If I want to read only every 10th item/element or every 5th element, something like this, How can I do it?

chunk_size=512*1024 #512 kb
big_file = open(file, 'rb')
counter = 0
while True:
        counter +=1
        if counter%5!=0:
           big_file.next(chunksize) #Just skip it, don't read it...HOW TO DO THIS LINE?
           continue #I want to skip the chunk, and in the next loop, read the next chunk.
        data = big_file .read(chunk_size)
        if not data:
            break

Speed is very important to me in this case. I will do it for millions of files. I am doing block hashing.

I would look at the `seek()` function. It should do what you want. Just keep up with the offset. See [this link](https://stackoverflow.com/questions/11696472/seek-function) — gnodab, Apr 10 '20 at 14:00

score 1 · Accepted Answer · answered Apr 10 '20 at 14:07

1

You can use the file's .seek() method for that. I track count of the current location in the file with pos. Data is only read by .read(chunk_size) every 5ths time.

Seeking beyond the file's size is not a problem. data will just be empty then, so we break if nothing was read.

chunk_size=512*1024 #512 kb
big_file = open("filename", 'rb')
counter = 0
pos = 0

while True:
    counter += 1
    if counter % 5 == 0:
        big_file.seek(pos)
        data = big_file.read(chunk_size)
        if not data:
            break
        print(data.decode("utf-8")) # here do your processing

    pos += chunk_size

answered Apr 10 '20 at 14:07

Lydia van Dyke

2,079
3
9
21

This method seems to be 1.5-2x slower than just adding continue. – Rahul Apr 10 '20 at 14:22
That is very surprising to me. – Lydia van Dyke Apr 10 '20 at 14:23
Sorry, I retested. It is 1.5-2x slower – Rahul Apr 10 '20 at 14:24
You can run my code above. Just removed` big_file.next(chunksize) ` and test – Rahul Apr 10 '20 at 14:25
I get different numbers here. 595 µs for your version, 90.3 µs for mine. Did you maybe include the extra print statement? – Lydia van Dyke Apr 10 '20 at 14:37
Not the print statement. If you got different result, then I might have done something wrong – Rahul Apr 10 '20 at 14:40

How to skip some chunks in python read file code?

1 Answers1