0

I'm supposed to read a large txt file in chunks and every word in chunk has to be processed. But some words can be cut into pieses. For instance:

text_in_file = 'some text in file to be processed'
text_in_file.read(15)

result will be 'some text in fi', 'le to be proces' and so on

Is there a way to find out whether word is cut and to join ending of previous chunk and beginning of a next one?

  • read one character at a time into a buffer of some sort, process the buffer when you hit a word boundary. The problem you are usually trying to resolve is having the whole file in memory at once, which is why you are reading it in chunks. – Tony Hopkinson Jan 24 '15 at 15:50

1 Answers1

0

Just read line by line, here's how: https://stackoverflow.com/a/8010133/3997052

This way you (probably) don't get "splited" words - depends on your file.

Community
  • 1
  • 1
NoamG
  • 981
  • 7
  • 15