0

I have a code like this:

chunk_size=512*1024 #512 kb
big_file = open(file, 'rb')
while True:
        data = big_file .read(chunk_size)
        if not data:
            break

If I want to read only every 10th item/element or every 5th element, something like this, How can I do it?

chunk_size=512*1024 #512 kb
big_file = open(file, 'rb')
counter = 0
while True:
        counter +=1
        if counter%5!=0:
           big_file.next(chunksize) #Just skip it, don't read it...HOW TO DO THIS LINE?
           continue #I want to skip the chunk, and in the next loop, read the next chunk.
        data = big_file .read(chunk_size)
        if not data:
            break

Speed is very important to me in this case. I will do it for millions of files. I am doing block hashing.

Zubad Ibrahim
  • 425
  • 2
  • 14
Rahul
  • 107
  • 8
  • I would look at the `seek()` function. It should do what you want. Just keep up with the offset. See [this link](https://stackoverflow.com/questions/11696472/seek-function) – gnodab Apr 10 '20 at 14:00

1 Answers1

1

You can use the file's .seek() method for that. I track count of the current location in the file with pos. Data is only read by .read(chunk_size) every 5ths time.

Seeking beyond the file's size is not a problem. data will just be empty then, so we break if nothing was read.

chunk_size=512*1024 #512 kb
big_file = open("filename", 'rb')
counter = 0
pos = 0

while True:
    counter += 1
    if counter % 5 == 0:
        big_file.seek(pos)
        data = big_file.read(chunk_size)
        if not data:
            break
        print(data.decode("utf-8")) # here do your processing

    pos += chunk_size
Lydia van Dyke
  • 2,079
  • 3
  • 9
  • 21