2

How to read n lines from a file instead of just one when iterating over it? I have a file which has well defined structure and I would like to do something like this:

for line1, line2, line3 in file:
    do_something(line1)
    do_something_different(line2)
    do_something_else(line3)

but it doesn't work:

ValueError: too many values to unpack

For now I am doing this:

for line in file:
    do_someting(line)
    newline = file.readline()
    do_something_else(newline)
    newline = file.readline()
    do_something_different(newline)
... etc.

which sucks because I am writing endless 'newline = file.readline()' which are cluttering the code. Is there any smart way to do this ? (I really want to avoid reading whole file at once because it is huge)

Elrond_EGLDer
  • 47,430
  • 25
  • 189
  • 180
Piotr Lopusiewicz
  • 2,274
  • 2
  • 24
  • 33

11 Answers11

4

Basically, your fileis an iterator which yields your file one line at a time. This turns your problem into how do you yield several items at a time from an iterator. A solution to that is given in this question. Note that the function isliceis in the itertools module so you will have to import it from there.

Community
  • 1
  • 1
neil
  • 2,797
  • 1
  • 12
  • 11
3

If it's xml why not just use lxml?

Tyler Eaves
  • 11,379
  • 1
  • 30
  • 39
  • Because I am doing really simple operations on strings and I don't need a parser which will read all the tags etc.; I just want to do a for first line, b for 2nd line, ... , j for 10th line and then repeat for 11,12,13...,20 etc. – Piotr Lopusiewicz Dec 03 '10 at 03:14
  • 3
    Is it semantic data or not. Please don't treat xml data as text. It makes pandas cry. Plus lxml is BLINDING fast. – Tyler Eaves Dec 03 '10 at 03:17
2

You could use a helper function like this:

def readnlines(f, n):
    lines = []
    for x in range(0, n):
        lines.append(f.readline())
    return lines

Then you can do something like you want:

while True:
    line1, line2, line3 = readnlines(file, 3)
    do_stuff(line1)
    do_stuff(line2)
    do_stuff(line3)

That being said, if you are using xml files, you will probably be happier in the long run if you use a real xml parser...

Mats Ekberg
  • 1,617
  • 3
  • 14
  • 22
  • Good idea, but you can't iterate over a function like that. You need to use the `yield` keyword to make a generator. Also, it's a great place to use a list comprehension. – Thomas K Dec 03 '10 at 11:22
  • @ThomasK Thanks for the feedback Thomas. I have updated the example. – Mats Ekberg Dec 03 '10 at 12:16
  • Not bad, but your while loop would never stop. I've posted a version below that uses a generator. – Thomas K Dec 03 '10 at 12:23
2

itertools to the rescue:

import itertools
def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)


fobj= open(yourfile, "r")
for line1, line2, line3 in grouper(3, fobj):
    pass
tzot
  • 81,264
  • 25
  • 129
  • 197
1

for i in file produces a str, so you can't just do for i, j, k in file and read it in batches of three (try a, b, c = 'bar' and a, b, c = 'too many characters' and look at the values of a, b and c to work out why you get the "too many values to unpack").

It's not clear entirely what you mean, but if you're doing the same thing for each line and just want to stop at some point, then do it like this:

for line in file_handle:
    do_something(line)
    if some_condition:
        break  # Don't want to read anything else

(Also, don't use file as a variable name, you're shadowning a builtin.)

Chris Morgan
  • 73,264
  • 19
  • 188
  • 199
0

If your're doing the same thing why do you need to process multiple lines per iteration?

For line in file is your friend. It is in general much more efficient than manually reading the file, both in terms of io performance and memory.

Tyler Eaves
  • 11,379
  • 1
  • 30
  • 39
  • sorry, edited, I want to do different things to every line in batch of n lines and then do the same things to lines from another batch of n files – Piotr Lopusiewicz Dec 03 '10 at 03:06
0

Do you know something about the length of the lines/format of the data? If so, you could read in the first n bytes (say 80*3) and f.read(240).split("\n")[0:3].

Paul Schreiber
  • 12,094
  • 4
  • 36
  • 61
  • Unfortunately it's huge xml'like file and some values there could have varying lengths – Piotr Lopusiewicz Dec 03 '10 at 03:05
  • How big is the largest file? 10K? 1MB? 100MB? I assume it's too big to read the whole file in, but even reading in 100K would be cheap/fast. Unless you have to do this a million times in a tight loop. – Paul Schreiber Dec 03 '10 at 03:10
  • The file I have right now is 80mb, there could be bigger ones in the future; I don't want to walk around the problem by loading the whole thing into memory as this particular problem is appearing quite often (at least to me :) ) – Piotr Lopusiewicz Dec 03 '10 at 03:16
  • Would you consider posting a, possibly sanitized, sample of this file. Depending on the structure there are some things you might consider. – Tyler Eaves Dec 03 '10 at 03:33
  • I just want simple way to read few lines at once; structure of the file doesn't matter here, I need language construct for this what seems to be basic task. The code I gave (with newline = readline()) is doing the job, it's just ugly and long. I am wondering what is the 'pythonic' way to write this. – Piotr Lopusiewicz Dec 03 '10 at 08:28
0

If you want to be able to use this data over and over again, one approach might be to do this:

lines = []
for line in file_handle:
    lines.append(line)

This will give you a list of the lines, which you can then access by index. Also, when you say a HUGE file, it is most likely trivial what the size is, because python can process thousands of lines very quickly.

Eric Pauley
  • 1,531
  • 1
  • 17
  • 29
  • I don't want to use this data over and over again; I want to read say 10 lines; do 10 different things to them, then read another 10 lines and repeat, etc. – Piotr Lopusiewicz Dec 03 '10 at 03:09
  • This would still be the method of choice in my opinion. If you do you data manipulation in a method, and have the list in there, it will be cleared by the garbage collector, so this won't use an excessive amount of memory, and you can access any index you'd like in any order. – Eric Pauley Dec 03 '10 at 03:12
  • This method isn't possible as I don't want to write the code which will crash if the file is bigger than memory size. IF we could load files into list every time then there wouldn't be file datatype or readline() at all, there would be just readall_and_put_into_list(file) method. This is not the way to do stuff. – Piotr Lopusiewicz Dec 03 '10 at 08:31
0

why can't you just do:

ctr = 0

for line in file:

  if ctr == 0:

     ....

  elif ctr == 1:

     ....

  ctr = ctr + 1

if you find the if/elif construct ugly you could just create a hash table or list of function pointers and then do:

for line in file:

   function_list[ctr]()

or something similar

atcuno
  • 443
  • 1
  • 4
  • 5
0

It sounds like you are trying to read from disk in parallel... that is really hard to do. All the solutions given to you are realistic and legitimate. You shouldn't let something put you off just because the code "looks ugly". The most important thing is how efficient/effective is it, then if the code is messy, you can tidy it up, but don't look for a whole new method of doing something because you don't like how one way of doing it looks like in code.

As for running out of memory, you may want to check out pickle.

Stunner
  • 11,108
  • 12
  • 79
  • 138
0

It's possible to do it with a clever use of the zip function. It's short, but a bit voodoo-ish for my tastes (hard to see how it works). It cuts off any lines at the end that don't fill a group, which may be good or bad depending on what you're doing. If you need the final lines, itertools.izip_longest might do the trick.

zip(*[iter(inputfile)] * 3)

Doing it more explicitly and flexibly, this is a modification of Mats Ekberg's solution:

def groupsoflines(f, n):
    while True:
        group = []
        for i in range(n):
            try:
                group.append(next(f))
            except StopIteration:
                if group:
                    tofill = n - len(group)
                    yield group + [None] * tofill
                return
        yield group

for line1, line2, line3 in groupsoflines(inputfile, 3):
    ...

N.B. If this runs out of lines halfway through a group, it will fill in the gaps with None, so that you can still unpack it. So, if the number of lines in your file might not be a multiple of three, you'll need to check whether line2 and line3 are None.

Thomas K
  • 35,785
  • 7
  • 76
  • 82