4

I have a string that I would like to "unflatten" or "tree-ify"; that is, I want to go from this:

F=8|A_C=3|A_B=2|D_G_H=11|D_B=2|E=5  

to this:

{
  A => {
    B => 2,
    C => 3,
  },
  D => {
     B => 2,
     G => {
       H => 11,
     },
  },
  E => 5,
  F => 8,
}

My strategy was to process each pipe-delimited field separately and split by the = sign into a key / value pair:

sub unflatten {
    my ($data) = @_;
    my @fields = split /\|/, $data;
    my $result = {};

    for my $datum (@fields) {
            my ($key, $value) = split /=/, $datum;
            $result->{&processline($key)} = $value;
    }
    return $result;
}

I was attempting some recursive magic in the processline function:

sub processline {
    my ($key) = @_;

    my ($first, $rest) = split /_/, $key, 2; # split key into at most 2 parts
    if($rest) {
            return { $first => &processline($rest) }; 
            # if the key is nested, there will be something in $rest
            # so recursively process the smaller $rest, and build up the result hashref
    }
    else {
            return $first;
    }
}

Unfortunately, this doesn't work:

my $header = "F=8|A_C=3|A_B=2|D_G_H=11|D_B=2|E=5";
use Data::Dumper;
print Dumper &unflatten($header);

When I do this, I get:

$VAR1 = {
      'F' => '8',
      'HASH(0xe9af60)' => '2',
      'HASH(0xe9ae28)' => '11',
      'E' => '5',
      'HASH(0xe9af90)' => '3',
      'HASH(0xe9ae40)' => '2'
    };

Could someone explain the thought process behind a recursive solution, or suggest where my Perl has gone so wrong? Frustratingly, I was able to come up with the inverse of this function (flatten) pretty easily.

Borodin
  • 123,915
  • 9
  • 66
  • 138
galvatron
  • 125
  • 7
  • The key you are returning from processline is not correct, see this: http://ideone.com/28apW3 – Chankey Pathak Jun 07 '14 at 04:33
  • As a side note, the `&` on your `&sub()` calls is not needed. It tells Perl to ignore prototyping, which you do not have. You can safely drop it. See http://stackoverflow.com/questions/1347396/when-should-i-use-the-to-call-a-perl-subroutine for more details. – simbabque Jun 07 '14 at 09:28

1 Answers1

6

I believe this is more straightforward with a simple for loop than using recursion. The method you have chosen cannot work because it uses only a single key returned by processline to assign the value, and that cannot create a multilevel hash.

The way a recursive solution would work is by taking a hash reference, a list of keys, and a value, and defining

unflatten($hash, 'key1_key2_key3_key4', 'value')

as

unflatten($hash->{key1}, 'key2_key3_key4', 'value')`

This program demonstrates a plain loop solution. It uses a pointer $hash that starts at the root of the resultant hash and moves forward a level after each key in the list.

sub unflatten {

  my $result = {};

  for my $item (split /\|/, $_[0]) {

    my ($keys, $value) = split /=/, $item;
    my @keys = split /_/, $keys;
    my $hash = $result;

    while (@keys > 1) {
      my $key = shift @keys;
      $hash->{$key} ||= {};
      $hash = $hash->{$key};
    }

    $hash->{$keys[0]} = $value;
  }

  return $result;
}

output

$VAR1 = {
      'A' => {
               'C' => '3',
               'B' => '2'
             },
      'F' => '8',
      'D' => {
               'G' => {
                        'H' => '11'
                      },
               'B' => '2'
             },
      'E' => '5'
    };

Update

Now that I'm back at a keyboard, here's a recursive solution. It results in an identical hash to the original

use strict;
use warnings;

use Data::Dumper;

my $data = 'F=8|A_C=3|A_B=2|D_G_H=11|D_B=2|E=5';

my $result = {};
unflatten2($result, $_) for split /\|/, $data;
print Dumper $result;

sub unflatten2 {
  my ($hash, $data) = @_;

  if ($data =~ /_/) {
    my ($key, $rest) = split /_/, $data;
    unflatten2($hash->{$key} ||= {}, $rest);
  }
  else {
    my ($key, $val) = split /=/, $data;
    $hash->{key} = $val;
  }
}

Update

You may also be interested in the Data::Diver module, which is intended for situations like this, although the documentation is a little clumsy

Here's how a solution using it would look

use strict;
use warnings;

use Data::Diver qw/ DiveVal /;
use Data::Dumper;

my $data = 'F=8|A_C=3|A_B=2|D_G_H=11|D_B=2|E=5';

my $result = {};    

for (split /\|/, $data) {
  my ($keys, $val) = split /=/;
  DiveVal($result, split /_/, $keys) = $val;
}

print Dumper $result;
Borodin
  • 123,915
  • 9
  • 66
  • 138
  • Thank you for the very detailed answer. I had tried something like your unflatten2, but again was returning a key for each line and so never got there. Awesome! – galvatron Jun 07 '14 at 05:50
  • @galvatron: I'm pleased it was useful. I've written another update that may interest you, using the `Data::Diver` module – Borodin Jun 07 '14 at 06:19
  • @Miller: Thankfully, Perl's optimisation removes the need to explicitly specify `$_` and a value for *LIMIT* when assigning to a fixed-length list. Take a look at `perl -MO=Deparse -e'($x) = split'` for proof! But note that something like `$x = (split)[2]` is still better written as `$x = (split $_, 3)[2]` – Borodin Jun 08 '14 at 21:39
  • Yes, that is true. However, I was actually encouraging a limit not for efficiency, but because what if the user would hypothetically want an equal sign in the value. Saying what one actually means there makes the code more robust by default, which is why I would lean away from the 2nd solution here because by changing the order of operations makes it so a _ can't be in the value. – Miller Jun 08 '14 at 21:43
  • @Miller: Ah I see what you mean. I sympathise, but I would lean towards sanitising the data with, for instance, `die if /=.*_/ or tr/=// > 1` as there is no way to do this properly and allow for equals signs in the keys – Borodin Jun 08 '14 at 22:04