12

What's the best way to read a fixed length record in Perl. I know to read a file like:

ABCDE 302
DEFGC 876

I can do

while (<FILE>) {
   $key = substr($_, 0, 5);
   $value = substr($_, 7, 3);
}

but isn't there a way to do this with read/unpack?

brian d foy
  • 121,466
  • 31
  • 192
  • 551
David Nehme
  • 20,665
  • 7
  • 73
  • 114

5 Answers5

18
my($key, $value) = unpack "A5 A3";    # Original, but slightly dubious

We both need to check out the options at the unpack manual page (and, more particularly, the pack manual page).

Since the A pack operator removes trailing blanks, your example can be encoded as:

my($key, $value) = unpack "A6A3";

Alternatively (this is Perl, so TMTOWTDI):

my($key, $blank, $value) = unpack "A5A1A3";

The 1 is optional but systematic and symmetric. One advantage of this is that you can validate that $blank eq " ".

Jonathan Leffler
  • 666,971
  • 126
  • 813
  • 1,185
12

Update: For the definitive answer, see Jonathan Leffler's answer below.

I wouldn't use this for just two fields (I'd use pack/unpack directly), but for 20 or 50 or so fields I like to use Parse::FixedLength (but I'm biased). E.g. (for your example) (Update: also, you can use $/ and <> as an alternative to read($fh, $buf, $buf_length)...see below):

use Parse::FixedLength;

my $pfl = Parse::FixedLength->new([qw(
  key:5
  blank:1
  value:3
)]);
# Assuming trailing newline
# (or add newline to format above and remove "+ 1" below)
my $data_length = $pfl->length() + 1;

{
  local $/ = \$data_length;
  while(<FILE>) {
    my $data = $pfl->parse($_);
    print "$data->{key}:$data->{value}\n";
    # or
    print $data->key(), ":", $data->value(), "\n";
  }
}

There are some similar modules that make pack/unpack more "friendly" (See the "See Also" section of Parse::FixedLength).

Update: Wow, this was meant to be an alternative answer, not the official answer...well, since it is what it is, I should include some of Jonathan Leffler's more straight forward code, which is likely how you should usually do it (see pack/unpack docs and Jonathan Leffler's node below):

$_ = "ABCDE 302";
my($key, $blank, $value) = unpack "A5A1A3";
runrig
  • 6,351
  • 2
  • 25
  • 43
6

Assume 10 character records of two five character fields per record:

open(my $fh, "<", $filename) or die $!;
while(read($fh, $buf, 10)) {
  ($field1, $field2) = unpack("A5 A5", $buf);
  # ... do something with data ...
}
Michael Cramer
  • 4,722
  • 1
  • 18
  • 15
-1

Here's yet another way to do it:

while (<FILE>)
{
    chomp;
    if (/^([A-Z]{5}) ([0-9]{3})$/)
    {
        $key = $1;
        $value = $2;
    }
}
ruben2020
  • 1,568
  • 13
  • 24
-2

Regardless of whether your records and fields are fixed-length, if the fields are separated by uniform delimiters (such as a space or comma), you can use the split function more easily than unpack.

my ($field1, $field2) = split / /;

Look up the documentation for split. There are useful variations on the argument list and on the format of the delimiter pattern.

Barry Brown
  • 19,087
  • 14
  • 65
  • 102
  • 1
    If any field values are less than the fixed width (although this isn't the case in his example), the string will be split for the trailing spaces as well, which is wrong. If the field value lengths are all identical, then you are correct, there is no difference between delimited and fixed-width – Adam Bellaire Jan 02 '09 at 21:00
  • 2
    It's not a matter of field length. If fields can have significant whitespace, you can't split on whitespace. That's one of the points of fixed-length fields. :) – brian d foy Jan 03 '09 at 00:32