How to read and extract information from a file that is being continuously updated?

Question

This is how I am planning to build my utilities for a project :

logdump dumps log results to file log. The results are appended to the existing results if the file is already there (like if a new file is created every month, the results are appended to the same file for that month).
extract reads the log result file to extract relevant results depending on the arguments provided.
The thing is that I do not want to wait for logdump to finish writing to log to begin processing it. Also that way I will need to remember till where I already read log to begin extracting more information, which is not what I want to do.
I need live results so that whenever something is added to the log results file, extract will get the required results.
The processing that extract will do will be generic (will depend on some command line arguments to it), but surely on a line by line basis.

This involves reading a file as and when it is being written to and continuously monitoring it for new updates even after you reach the end of the log file.

How can I do this using C or C++ or shell scripting or Perl?

In these cases, I try to modify the logging to go to a database. Then it's really easy to get the records you haven't processed yet. If you haven't designed the logging part yet, that could be the way to go. — brian d foy, Sep 09 '10 at 22:28

score 15 · Accepted Answer · answered Sep 09 '10 at 20:16

15

tail -f will read from a file and monitor it for updates when it reaches EOF instead of quitting outright. It's an easy way to read a log file "live". Could be as simple as:

tail -f log.file | extract

Or maybe tail -n 0 -f so it only prints new lines, not existing lines. Or tail -n +0 -f to display the entire file, and then continue updating thereafter.

answered Sep 09 '10 at 20:16

John Kugelman

307,513
65
473
519

While this serves my need, is there any way to do the same using C or C++? – Lazer Sep 09 '10 at 21:03
1

@Lazer: you can always "cheat" and look at the "Hacker's Man Page"--the source code to tail. IIRC, it is really simple C code. Look here: http://stackoverflow.com/questions/1439799/how-can-i-get-the-source-code-for-the-linux-utility-tail – Harold Bamford Sep 09 '10 at 21:15

score 9 · Answer 2 · edited Sep 09 '10 at 21:40

The traditional unix tool for this is tail -f, which keeps reading data appended to its argument until you kill it. So you can do

tail -c +1 -f log | extract

In the unix world, reading from continuously appended-to files has come to be known as “tailing”. In Perl, the File::Tail module performs the same task.

use File::Tail;
my $log_file = File::Tail->new("log");
while (defined (my $log_line = $log_file->read)) {
    process_line($log_line);
}

score 6 · Answer 3 · answered Sep 09 '10 at 22:49

Using a simple stand-in for logdump

#! /usr/bin/perl

use warnings;
use strict;

open my $fh, ">", "log" or die "$0: open: $!";
select $fh;
$| = 1;  # disable buffering

for (1 .. 10) {
  print $fh "message $_\n" or warn "$0: print: $!";
  sleep rand 5;
}

and the skeleton for extract below to get the processing you want. When logfile encounters end-of-file, logfile.eof() is true. Calling logfile.clear() resets all the error state, and then we sleep and try again.

#include <iostream>
#include <fstream>
#include <cerrno>
#include <cstring>
#include <unistd.h>

int main(int argc, char *argv[])
{
  const char *path;
  if      (argc == 2) path = argv[1];
  else if (argc == 1) path = "log";
  else {
    std::cerr << "Usage: " << argv[0] << " [ log-file ]\n";
    return 1;
  }

  std::ifstream logfile(path);
  std::string line;
  next_line: while (std::getline(logfile, line))
    std::cout << argv[0] << ": extracted [" << line << "]\n";

  if (logfile.eof()) {
    sleep(3);
    logfile.clear();
    goto next_line;
  }
  else {
    std::cerr << argv[0] << ": " << path << ": " << std::strerror(errno) << '\n';
    return 1;
  }

  return 0;
}

It's not as interesting as watching it live, but the output is

./extract: extracted [message 1]
./extract: extracted [message 2]
./extract: extracted [message 3]
./extract: extracted [message 4]
./extract: extracted [message 5]
./extract: extracted [message 6]
./extract: extracted [message 7]
./extract: extracted [message 8]
./extract: extracted [message 9]
./extract: extracted [message 10]
^C

I left the interrupt in the output to emphasize that this is an infinite loop.

Use Perl as a glue language to make extract get lines from the log by way of tail:

#! /usr/bin/perl

use warnings;
use strict;

die "Usage: $0 [ log-file ]\n" if @ARGV > 1;
my $path = @ARGV ? shift : "log";

open my $fh, "-|", "tail", "-c", "+1", "-f", $path
  or die "$0: could not start tail: $!";

while (<$fh>) {
  chomp;
  print "$0: extracted [$_]\n";
}

Finally, if you insist on doing the heavy lifting yourself, there's a related Perl FAQ:

How do I do a tail -f in perl?

First try
seek(GWFILE, 0, 1);
The statement seek(GWFILE, 0, 1) doesn't change the current position, but it does clear the end-of-file condition on the handle, so that the next <GWFILE> makes Perl try again to read something.

If that doesn't work (it relies on features of your stdio implementation), then you need something more like this:
for (;;) {
  for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
    # search for some stuff and put it into files
  }
  # sleep for a while
  seek(GWFILE, $curpos, 0);  # seek to where we had been
}
If this still doesn't work, look into the clearerr method from IO::Handle, which resets the error and end-of-file states on the handle.

There's also a File::Tail module from CPAN.

How to read and extract information from a file that is being continuously updated?

3 Answers3

How do I do a tail -f in perl?