Reading files by line location

Question

So, I have a data file which has 3 columns. What I am trying to do is create a function that takes in start and end line number as input. Something like:

def(start line number, end line number):
    with open("data.txt", 'r') as f:
        for line in f:
            splitted_line = line.strip().split(",")
            date1 = datetime.strptime(splitted_line[0],'%Y%m%d:%H:%M:%S.%f')
            price = float(splitted_line[1])
            volume = int(splitted_line[2])
            my_tuple=(date1,price,volume)

@J Doe you should accept an answer if one worked for you. – Galen Long Mar 15 '17 at 01:13 — Galen Long, Mar 15 '17 at 01:13

score 1 · Accepted Answer · answered Mar 09 '17 at 01:54

def func(start,end):
    with open("data.txt", 'r') as f:
        for idx,line in enumerate(f):
          if idx == end:
            break 
          if idx < start:
            continue

          splitted_line = line.strip().split(",")
          date1 = datetime.strptime(splitted_line[0],'%Y%m%d:%H:%M:%S.%f')
          price = float(splitted_line[1])
          volume = int(splitted_line[2])
          my_tuple=(date1,price,volume)

score 0 · Answer 2 · edited May 23 '17 at 12:17

If I'm reading this correctly, this function should only read the rows which are numbered in the range [start_line, end_line] (I'm assuming this is an inclusive range, i.e. you want to read both the start and end lines as well). Why not write your for loop with enumeration and simply skip rows that are out of the passed range?

def read_line_range_inclusive(start_line, end_line):
    filename = "data.txt"
    with open(filename) as f:
        for i, line in enumerate(f):
            if i < start_line: # will read the start line itself
                continue # keep going...
            if i > end_line: # will read the end line itself
                break # we're done

            # ... perform operations on lines ...

Also, be careful when splitting by commas; this works fine for simple rows like 1,2,3 but what about 1,2,"a,b,c",3, where "abc" shouldn't be split into separate columns? I recommend using the built-in csv module, which handles these edge cases automatically:

import csv

def read_line_range_inclusive(start_line, end_line):
    filename = "data.txt"
    with open(filename) as f:
        for i, row in enumerate(csv.reader(f)):
            # row will already be separated into list
            # ... proceed as before ...

Note that you can only use the with statement on the file object itself, not on the csv.reader parsed file, so this wouldn't work: with csv.reader(open(filename)) as f:.

score 0 · Answer 3 · answered Mar 09 '17 at 02:23

0

If you use CSV reader, you can access the line number:

csvreader.line_num

The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines.

answered Mar 09 '17 at 02:23

宏杰李

10,658
2
21
32

score 0 · Answer 4 · answered Mar 09 '17 at 02:24

We can combine the linecache module and csv to get the job done:

import csv
import linecache


def get_lines(filename, start_line_number, end_line_number):
    """
    Given a file name, start line and end line numbers,
    return those lines in the file
    """
    for line_number in range(start_line_number, end_line_number + 1):
        yield linecache.getline(filename, line_number)


if __name__ == '__main__':
    # Get lines 4-6 inclusive from the file
    lines = get_lines('data.txt', 4, 6)
    reader = csv.reader(lines)

    for row in reader:
        print(row)

Consider the data file, data.txt:

# this is line 1
# line 2

501,john
502,karen
503,alice

# skip this line
# and this, too

The above code will produce the following output:

['501', 'john']
['502', 'karen']
['503', 'alice']

Discussion

linecache is a lesser-known library which allow the user to quickly retrieve lines from a text file
csv is a library to deal with comma-separated values
By combining them, we can get the job done with little effort

Reading files by line location

4 Answers4

Discussion