8

I'm quite new to Perl and I'm trying to build a hash recursively and getting nowhere. I tried searching for tutorials to dynamically build hashes, but all I could find were introductory articles about hashes. I would be grateful if you point me towards the right direction or suggest a nice article/tutorial.

I'm trying to read from a file which has paths in the form of

one/two/three
four
five/six/seven/eight

and I want to build a hash like

VAR = {
    one : {
        two : {
            three : ""
        }
    }
    four : ""
    five : {
        six : {
            seven : {
                 eight : ""
            }
        }
    }
}

The script I'm using currently is :

my $finalhash = {}; 
my @input = <>;

sub constructHash {
    my ($hashrf, $line) = @_; 
    @elements = split(/\//, $line);
    if(@elements > 1) {
        $hashrf->{shift @elements} = constructHash($hashrf->{$elements[0]}, @elements ); 
    } else {
        $hashrf->{shift @elements} = ""; 
    }
    return $hashrf;
}

foreach $lines (@input) {
    $finalhash = constructHash($finalhash, $lines);
}
Axeman
  • 29,194
  • 2
  • 42
  • 98
Gaurav Dadhania
  • 4,629
  • 8
  • 39
  • 59
  • 4
    Your `my ($hashrf, $line) = $_;` line should probably read `my ($hashrf, $line) = @_;` instead. – JB. Dec 30 '10 at 00:15
  • Fixed :) But it still doesn't add up to anything. If I print the value of `hashrf` in the loop, it is always `''` – Gaurav Dadhania Dec 30 '10 at 00:22

5 Answers5

7

Data::Diver covers this niche so well that people shouldn't reinvent the wheel.

use strict;
use warnings;
use Data::Diver 'DiveVal';
use Data::Dumper;

my $root = {};
while ( my $line = <DATA> ) {
    chomp($line);
    DiveVal( $root, split m!/!, $line ) = '';
}
print Dumper $root;
__DATA__
one/two/three
four
five/six/seven/eight
ysth
  • 88,068
  • 5
  • 112
  • 203
  • Thank you for suggesting this module. I tried to search CPAN but couldn't find this (maybe including 'hash' in the search wasn't a good idea on my part). I gotta say though, I learnt a lot from re-inventing the wheel (actually, seeing the wheel re-invented by the experts) :) – Gaurav Dadhania Dec 30 '10 at 02:56
6

This is a bit far-fetched, but it works:

sub insert {
  my ($ref, $head, @tail) = @_;
  if ( @tail ) { insert( \%{$ref->{$head}}, @tail ) }
  else         {            $ref->{$head} = ''      }
}

my %hash;
chomp and insert \%hash, split( '/', $_ ) while <>;

It relies on autovivification, which is admittedly a bit advanced for a beginner.

What would probably make any answer to your question a bit twisted is that you ask for empty strings in the leaves, which is of a different "type" than the hashes of the nodes, and requires a different dereferencing operation.

JB.
  • 34,745
  • 10
  • 79
  • 105
  • 3
    just make the `while` loop and `chomp` explicit on its own line, and it will be far readable _for beginner_. – J-16 SDiZ Dec 30 '10 at 00:39
  • I wish I could just wholly do away with the `chomp`. It's needed to not have ugly newlines in the leaf keys, but really adds nothing to the problem. As for the `while`, opinions vary, but you're welcome to edit me if you really deeply think so. – JB. Dec 30 '10 at 00:48
  • ++ because your solution doesn't leave empty hashrefs like mine : ) – Hugmeir Dec 30 '10 at 00:54
  • @JB, I like your concise approach a lot, but it looped forever. I could only get it to return by rewriting the while loop: `foreach (<>) {chomp; insert \%hash, split( '/', $_ )}` – Nathan Dec 30 '10 at 00:56
  • @Nathan I'd be curious to find out why--that `while` works fine as is on my system. I replaced the comma with an `and` as a readability enhancer, but AFAICT it can only make the snippet more fragile wrt line endings, not loop forever. – JB. Dec 30 '10 at 01:06
  • Hmm... I get the same result with v5.8.8 and v5.10.1. I don't honestly understand why. – Nathan Dec 30 '10 at 01:18
  • You want `chomp,` not `chomp and`; otherwise you discard the last line if it is missing its newline. – ysth Dec 30 '10 at 02:49
  • @Nathan: doesn't hang for me on 5.8.9 or 5.10.1. I suspect you are doing something slightly different than posted? – ysth Dec 30 '10 at 02:52
  • @ysth @JB I was doing something different, I think using an array or `_DATA_` instead of `<>` but I don't remember now. – Nathan Jan 06 '11 at 19:19
4

I've never done something like this, so this approach is likely to be wrong, but well, here's my shot:

use 5.013;
use warnings;
use Data::Dumper;

sub construct {
   my $hash = shift;
   return unless @_;

   return construct($hash->{shift()} //= {}, @_);
}

my %hash;

while (<DATA>) {
   chomp;
   construct(\%hash, split m!/!);
}

say Dumper \%hash;

__DATA__
one/two/three
four
five/six/seven/eight

EDIT: Fixed!

EDIT2: A (I think) tail-call optimized version, because!

sub construct {
   my $hash = shift;
   return unless @_;
   unshift @_, $hash->{shift()} //=  @_ ? {} : '';

   goto &construct;
}
Hugmeir
  • 1,239
  • 6
  • 9
  • Ah, that works! :D however it has a small problem, suppose I had `one/two/three` and `one/two/four` it will overwrite `one/two/three` with `one/two/four` instead of having both. I hope this makes sense. – Gaurav Dadhania Dec 30 '10 at 00:32
  • There you go, fixed! It makes use of the defined-or operator, which works only on a 5.10+ Perl -- in older Perls, you'd have to do something like this: my $temp = $hash->{shift()}; construct((defined($temp) ? $temp : {}), @_) http://perldoc.perl.org/perlop.html#C-style-Logical-Defined-Or – Hugmeir Dec 30 '10 at 00:39
  • 1
    If you have to deal with duplicate keys (`one` for `one/two/three` and also for `one/two/four`) you can't use hashes like you're trying. You'll have to use an array or a different key, like the full line itself – Nathan Dec 30 '10 at 00:44
  • yeah, I didn't know it could be fixed by adding two characters :) I added ` $tempkey = shift; $temphash = $hash->{$tempkey}; if(ref($temphash) eq 'HASH') { return construct($temphash, @_); } else { return construct($hash->{$tempkey} = {}, @_); }` and it worked. :) – Gaurav Dadhania Dec 30 '10 at 00:45
  • @Nathan - I didn't mean duplicate keys. It would overwrite `{one}->{two}->{three}` with `{one}->{two}->{four}` because of creating a new hash once we get to `{one}->{two}` the second time around in the recursion. At least, I think that's what caused the problem. – Gaurav Dadhania Dec 30 '10 at 01:06
  • @Gaurav, are you using this to build a map of a heirarchical file structure? There might be a better way. – Nathan Dec 30 '10 at 01:13
  • @Nathan, I'm building this for paths to elements within a file. But I'm curious, what other way do you suggest? Thanks :) – Gaurav Dadhania Dec 30 '10 at 01:20
  • @Gaurav, I don't know, `File::Find` or an XML module or something. Did you look on CPAN? – Nathan Dec 30 '10 at 01:23
  • @Nathan, I searched CPAN and which I did find modules to convert between XML documents and Hashes, I didn't find one for paths or dynamic hash generation (is that what this is called?). – Gaurav Dadhania Dec 30 '10 at 01:26
3

I ran your code and found a few problems:

  • you haven't scoped @elements properly.
  • with that recursion you're creating a hash that references itself, which is not what you want.
  • in your outermost call, the second arg to constructHash() is a string, but on the recursive call inside, you pass an array of @elements

Try this.

use Data::Dumper;

my $finalhash = {}; 
my @input = split "\n", <<INPUT;
one/two/three
four
five/six/seven/eight
INPUT

sub constructHash {
    my $line = shift; 
    my ($first, $remainder) = split(/\//, $line,2);

    if ($remainder) {
        return { $first => constructHash($remainder) } ; 
    } else {
        return { $first , "" }; 
    }
}

foreach $lines (@input) {
    my $linehash = constructHash($lines);
    my $firstkey = (keys %$linehash)[0];
#    print Dumper $linehash;
    $finalhash->{$firstkey} = $linehash->{$firstkey};
} 


print Dumper $finalhash;

It produces

$VAR1 = {
          'five' => {
                      'six' => {
                                 'seven' => {
                                              'eight' => ''
                                            }
                               }
                    },
          'one' => {
                     'two' => {
                                'three' => ''
                              }
                   },
          'four' => ''
        };

Remember, Perl hashes aren't ordered.

Nathan
  • 3,534
  • 1
  • 23
  • 31
  • Thank you for the corrections: 1) Yup, beginners mistake I guess. 2) After I posted the question, I tried `join('/', @elements)` for the in the recursive call, but it didn't seem to work. 3) I guess this was the main problem. I still don't understand why it didn't work. Can you point me to a article/tutorial about hash references that would make this clearer? Again, thank you very much. :) – Gaurav Dadhania Dec 30 '10 at 00:50
  • For 3, the problem isn't really hash references but parameter passing. Currently your sub is a function of two arguments (ref and line), and that's the way you invoke it from the toplevel. But the recursive call invokes it as a function of many arguments: the ref and multiple elements. In effect, you'd lose all elements but the first each time you recursed. – JB. Dec 30 '10 at 00:56
  • 2
    @Guarav, I learned this stuff by staring at [`perldoc perlref`](http://perldoc.perl.org/perlref.html) and its relatives (perldsc,lol,toot) and [Effective Perl Programming](http://en.wikipedia.org/wiki/Effective_Perl_Programming) was also instrumental. – Nathan Dec 30 '10 at 01:05
  • @JB Ah, that makes sense. Thank you :) – Gaurav Dadhania Dec 30 '10 at 01:37
1

The basics:

JB.
  • 34,745
  • 10
  • 79
  • 105
  • Neither of those links deal with recursively/dynamically generating a data structure (is there a better name for this?), though. Still, gotta love perldsc. – Hugmeir Dec 30 '10 at 01:08
  • 2
    I just didn't feel like linking to wikipedia for recursion. perldsc explains what hashes of hashes are made of (and in Perl, it's not trivial). That's the hard part of it. Not recursion, not the dynamic aspect. In my opinion, anyway :-) – JB. Dec 30 '10 at 01:12
  • Good enough for me : ) Also, I forgot, thanks for the perlglossary link! I didn't know that existed. – Hugmeir Dec 30 '10 at 01:16