8

I've tried something like this

file_in <- file("myfile.log","r")
x <- readLines(file_in, n=-100)

but I'm still waiting...

Any help would be greatly appreciated

George Dontas
  • 27,579
  • 17
  • 103
  • 140
  • 1
    can imagine you have to wait pretty long. negative n values indicate to read to the end of file. If that file is 7.5 Gb, well... – Joris Meys Apr 08 '11 at 14:52

6 Answers6

11

I'd use scan for this, in case you know how many lines the log has :

scan("foo.txt",sep="\n",what="char(0)",skip=100)

If you have no clue how many you need to skip, you have no choice but to move towards either

  • reading in everything and taking the last n lines (in case that's feasible),
  • using scan("foo.txt",sep="\n",what=list(NULL)) to figure out how many records there are, or
  • using some algorithm to go through the file, keeping only the last n lines every time

The last option could look like :

ReadLastLines <- function(x,n,...){    
  con <- file(x)
  open(con)
  out <- scan(con,n,what="char(0)",sep="\n",quiet=TRUE,...)

  while(TRUE){
    tmp <- scan(con,1,what="char(0)",sep="\n",quiet=TRUE)
    if(length(tmp)==0) {close(con) ; break }
    out <- c(out[-1],tmp)
  }
  out
}

allowing :

ReadLastLines("foo.txt",100)

or

ReadLastLines("foo.txt",100,skip=1e+7)

in case you know you have more than 10 million lines. This can save on the reading time when you start having extremely big logs.


EDIT : In fact, I'd not even use R for this, given the size of your file. On Unix, you can use the tail command. There is a windows version for that as well, somewhere in a toolkit. I didn't try that out yet though.

Joris Meys
  • 98,937
  • 27
  • 203
  • 258
  • Really nice summary (+1)! I would add that counting the lines outside of R (e.g.: `wc` in Linux) could be a lot easier/faster. Other: I do not get the point in `while(1) {...}`. Will this loop ever end? – daroczig Apr 08 '11 at 15:39
  • @daroczig : indeed, it never ends. Ugly hack, I know, but I use it more often if I have to check a condition somewhere in the middle of a loop. And indeed, I'd resort to other tools than R for this question, but this is how I'd do it in R. – Joris Meys Apr 08 '11 at 15:41
  • @Joris Meys: thanks for your kind answer. Now I see the `break` part in your loop, somehow I overlooked it :) Sorry for bothering. – daroczig Apr 08 '11 at 15:48
  • Do you know the windows version of `tail`? – George Dontas Apr 09 '11 at 10:09
  • @gd047 : Here you can find some info : http://stackoverflow.com/questions/187587/looking-for-a-windows-equivalent-of-the-unix-tail-command. But as said, I never tried it myself – Joris Meys Apr 09 '11 at 10:19
  • 2
    `while(TRUE)` would be easier to understand - doesn't rely on internal coercion to logical. – hadley Apr 09 '11 at 23:25
  • @hadley : indeed, very right. It's the old Perl way I still use. Corrected and remembered. – Joris Meys Apr 10 '11 at 01:08
4

You could do this with read.table by specifying the skip parameter. If your lines are not to be parsed to variables, specify the separator to be '\n' as @Joris Meys pointed out below, and also set as.is=TRUE to get character vectors instead of factors.

Small example (skipping the first 2000 lines):

df <- read.table('foo.txt', sep='\n', as.is=TRUE, skip=2000)
daroczig
  • 26,353
  • 6
  • 84
  • 122
  • 1
    Nice hack, but don't forget to use as.is=T. It works if you set the sep="\n", eg : `read.table("foo.txt",sep="\n",as.is=T,skip=100)` – Joris Meys Apr 08 '11 at 14:12
  • Thank you @Joris Meys, I have updated my answer based on your really helpful comment. – daroczig Apr 08 '11 at 14:18
  • What if I don't know how many lines to skip? The file is about 7.5 GB. I was mistakenly thinking that using -n, readlines returns the last n lines. It reads everything and that's why it took so long. – George Dontas Apr 08 '11 at 14:52
  • 2
    @gd047: I would not let R count the number of lines of a 7.5 Gb textfile, I would use a fast dedicated software for this. I suppose counting the lines in R would take a lot of time, but using e.g. `wc` in Linux could parse the file in a few seconds. You could also call `wc` from R (see: `?system`) if using Linux. – daroczig Apr 08 '11 at 15:34
  • 2
    @daroczig : for what it's worth, I'd use the `tail` command in linux : http://www.computerhope.com/unix/utail.htm – Joris Meys Apr 08 '11 at 15:45
  • @Joris Meys: yes, very good point! I was not enough creative to diverge from R totally, just concentrating in finding out the number of lines fast and then do the rest in R :) – daroczig Apr 08 '11 at 16:17
2

You can read last n lines by following method

Step 1 - Open your file as your wish df <- read.csv("hw1_data.csv")

Step 2 - Now use tail function to read n lines from last

tail(df, 2)

0

As @JorisMeys already mentioned the unix command tail would be the easiest way to solve this problem. However I want to propose a seek based R solution that starts reading the file from the end of the file:

tailfile <- function(file, n) {
  bufferSize <- 1024L
  size <- file.info(file)$size

  if (size < bufferSize) {
    bufferSize <- size
  }

  pos <- size - bufferSize
  text <- character()
  k <- 0L

  f <- file(file, "rb")
  on.exit(close(f))

  while(TRUE) {
    seek(f, where=pos)
    chars <- readChar(f, nchars=bufferSize)
    k <- k + length(gregexpr(pattern="\\n", text=chars)[[1L]])
    text <- paste0(text, chars)

    if (k > n || pos == 0L) {
      break
    }

    pos <- max(pos-bufferSize, 0L)
  }

  tail(strsplit(text, "\\n")[[1L]], n)
}

tailfile(file, n=100)
sgibb
  • 23,631
  • 1
  • 59
  • 67
0

Some folks have said it already, but if you have a large log, it is most efficient to only read in what you need instead of reading it all into memory, then subsetting what you need.

For this, we use R's system() to run the Linux tail command.

Read the last 10 lines of the log:

system("tail path/to/my_file.log")

Read the last 2 lines of the log:

system("tail -n 2 path/to/my_file.log")

Read the last 2 lines of the log and capture the output in a character vector:

last_2_lines <- system("tail -n 2 path/to/my_file.log", intern = TRUE)
Rich Pauloo
  • 5,424
  • 3
  • 22
  • 50
-1

For seeing the last few lines:

tail(file_in,100) 
biruk1230
  • 2,508
  • 4
  • 12
  • 26
misrak
  • 1