Read large txt file in chunks and process data

Question

I'm supposed to read a large txt file in chunks and every word in chunk has to be processed. But some words can be cut into pieses. For instance:

text_in_file = 'some text in file to be processed'
text_in_file.read(15)

result will be 'some text in fi', 'le to be proces' and so on

Is there a way to find out whether word is cut and to join ending of previous chunk and beginning of a next one?

read one character at a time into a buffer of some sort, process the buffer when you hit a word boundary. The problem you are usually trying to resolve is having the whole file in memory at once, which is why you are reading it in chunks. — Tony Hopkinson, Jan 24 '15 at 15:50

score 0 · Answer 1 · edited May 23 '17 at 12:28

0

This way you (probably) don't get "splited" words - depends on your file.

edited May 23 '17 at 12:28

Community

answered Jan 24 '15 at 20:19

NoamG

Thanks, but this is not exactly what I needed. – Sergii Tronko Jan 28 '15 at 13:03

1 Answers1