3

Is there an efficient C or C++ way to read the last row of a CSV file? The naive approach involves reading in the entire file and then going to the end. Is there a quicker way this can be done (particularly if the CSV files are large)?

effeffe
  • 2,713
  • 3
  • 18
  • 42
user788171
  • 13,907
  • 36
  • 87
  • 117
  • 1
    You only added "C++". What about the C tag? After all, you stated "C/C++" and the two languages and different functions for reading files, e.g. for C - `fgets`, for C++ - `std::getline`. – Thomas Matthews May 26 '13 at 18:50

5 Answers5

4

What you can do is guess the line length, then jump 2-3 lines before the end of the file and read the remaining lines. The last line you read is the last one, as long you read at least one line prior (otherwise, you still start again with a bigger offset)

I posted some sample code for doing a similar thing (reading last N lines) in this answer (in PHP, but serves as an illustration)

For implementations in a variety of languages, see

Community
  • 1
  • 1
Paul Dixon
  • 277,937
  • 48
  • 303
  • 335
1

You can try working backwards. Read some size block of bytes from the end of the file, and look for the newline. If there is no newline in that block, then read the previous block, and so on.

Note that if the size of a row relative to the size of the file is large that this may result in worse performance, because most file caching schemes assume someone reads forward in the file.

Billy ONeal
  • 97,781
  • 45
  • 291
  • 525
  • One issue is that one could back up two text lines or more. So the file would have to be read until the last text line caused EOF. One can't assume that the next text line is the last line of the file. :-) – Thomas Matthews May 26 '13 at 18:54
0

You can use Perl module File::ReadBackwards.

mvp
  • 94,368
  • 12
  • 106
  • 137
0

Read with what and on what? On a Unix system, if you want the last line, it is as simple as

tail -n1 file.csv

If you want this approach from within your C++ app, you can do something like

system("tail -n1 file.csv")

if you want a quick and dirty way to accomplish this task.

Dmitri
  • 2,340
  • 1
  • 21
  • 38
0

Your problem falls into the same domain as searching for a string within a file. As you rightly point out, it's not always a great idea to read the entire file into memory and then search for your string. But you can always do the next best thing. Memory map your file. Then use your string searching functions to search backwards from the end of the string for your newline.

It's an extremely efficient mechanism with minimal memory footprint and optimum disk I/O.

John Sheridan
  • 446
  • 4
  • 8
  • Can you provide more information on how to memory map the file in question? This is a new approach I haven't heard about yet so I'm interested in knowing more details. – user788171 May 26 '13 at 18:49
  • 1
    When using memory mapped files, it's the same as reading the file into memory (in pieces), except either the Run-Time library is doing this or the OS is doing this. – Thomas Matthews May 26 '13 at 18:52