How can I merge several hashes into one hash in Perl?

Question

In Perl, how do I get this:

$VAR1 = { '999' => { '998' => [ '908', '906', '0', '998', '907' ] } }; 
$VAR1 = { '999' => { '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ] } }; 
$VAR1 = { '999' => { '996' => [] } }; 
$VAR1 = { '999' => { '995' => [] } }; 
$VAR1 = { '999' => { '994' => [] } }; 
$VAR1 = { '999' => { '993' => [] } }; 
$VAR1 = { '999' => { '997' => [ '986', '987', '990', '984', '989', '988' ] } }; 
$VAR1 = { '995' => { '101' => [] } }; 
$VAR1 = { '995' => { '102' => [] } }; 
$VAR1 = { '995' => { '103' => [] } }; 
$VAR1 = { '995' => { '104' => [] } }; 
$VAR1 = { '995' => { '105' => [] } }; 
$VAR1 = { '995' => { '106' => [] } }; 
$VAR1 = { '995' => { '107' => [] } }; 
$VAR1 = { '994' => { '910' => [] } }; 
$VAR1 = { '993' => { '909' => [] } }; 
$VAR1 = { '993' => { '904' => [] } }; 
$VAR1 = { '994' => { '985' => [] } }; 
$VAR1 = { '994' => { '983' => [] } }; 
$VAR1 = { '993' => { '902' => [] } }; 
$VAR1 = { '999' => { '992' => [ '905' ] } };

to this:

$VAR1 = { '999:' => [
 { '992' => [ '905' ] },
 { '993' => [
  { '909' => [] },
  { '904' => [] },
  { '902' => [] }
 ] },
 { '994' => [
  { '910' => [] },
  { '985' => [] },
  { '983' => [] }
 ] },
 { '995' => [
  { '101' => [] },
  { '102' => [] },
  { '103' => [] },
  { '104' => [] },
  { '105' => [] },
  { '106' => [] },
  { '107' => [] }
 ] },
 { '996' => [] },
 { '997' => [ '986', '987', '990', '984', '989', '988' ] },
 { '998' => [ '908', '906', '0', '998', '907' ] },
 { '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ] }
]};

We need to see the code that's generating the initial output. More specifically, we need to know all the variables that Data::Dumper is calling `$VAR1`. — Michael Carman, May 04 '10 at 17:46
What specifically about data structure syntax are you having difficulty with? Have you read http://perldoc.perl.org/perldsc.html? Have you tried writing out the problem in pseudocode? Once you have an algorithm, we can help you with the syntax, but those numbers have no meaning to anyone else but you, as we don't know the context of your application. — Ether, May 04 '10 at 17:53
Your destination format doesn't look *that* useful. You have '999' mapped to an array of separate hashes. And you have additional keys mapped that same way as well. I'm not sure that buys you what you might think it does. — Axeman, May 04 '10 at 18:14

score 4 · Accepted Answer · answered May 04 '10 at 20:32

I think this is closer than anybody else has gotten:

This does most of what you want. I did not store things in arrays of singular hashes, as I don't feel that that is useful.

Your scenario is not a regular one. I've tried to genericize this to some extent, but was not possible to overcome the singularity of this code.

First of all because it appears you want to collapse everything with the same id into a merged entity (with exceptions), you have to descend through the structure pulling the definitions of the entities. Keeping track of levels, because you want them in the form of a tree.
Next, you assemble the ID table, merging entities as possible. Note that you had 995 defined as an empty array one place and as a level another. So given your output, I wanted to overwrite the empty list with the hash.
After that, we need to move the root to the result structure, descending that in order to assign canonical entities to the identifiers at each level.

Like I said, it's not anything that regular. Of course, if you still want a list of hashes which are no more than pairs, that's an exercise left to you.

use strict;
use warnings;

# subroutine to identify all elements
sub descend_identify {
    my ( $level, $hash_ref ) = @_;
    # return an expanding list that gets populated as we desecend 
    return map {
        my $item = $hash_ref->{$_};
        $_ => ( $level, $item )
            , ( ref( $item ) eq 'HASH' ? descend_identify( $level + 1, $item ) 
              :                          ()
              )
           ;
    } keys %$hash_ref
    ;
}

# subroutine to refit all nested elements
sub descend_restore { 
    my ( $hash, $ident_hash ) = @_;

    my @keys        = keys %$hash;
    @$hash{ @keys } = @$ident_hash{ @keys };
    foreach my $h ( grep { ref() eq 'HASH' } values %$hash ) {
        descend_restore( $h, $ident_hash );
    }
    return;
}

# merge hashes, descending down the hash structures.
sub merge_hashes {
    my ( $dest_hash, $src_hash ) = @_;
    foreach my $key ( keys %$src_hash ) {
        if ( exists $dest_hash->{$key} ) {
            my $ref = $dest_hash->{$key};
            my $typ = ref( $ref );
            if ( $typ eq 'HASH' ) {
                merge_hashes( $ref, $src_hash->{$key} );
            }
            else { 
                push @$ref, $src_hash->{$key};
            }
        }
        else {
            $dest_hash->{$key} = $src_hash->{$key};
        }
    }
    return;
}

my ( %levels, %ident_map, %result );

#descend through every level of hash in the list
# @hash_list is assumed to be whatever you Dumper-ed.
my @pairs = map { descend_identify( 0, $_ ); } @hash_list;

while ( @pairs ) {
    my ( $key, $level, $ref ) = splice( @pairs, 0, 3 );
    $levels{$key} |= $level;

    # if we already have an identity for this key, merge the two
    if ( exists $ident_map{$key} ) {
        my $oref = $ident_map{$key};
        my $otyp = ref( $oref );
        if ( $otyp ne ref( $ref )) {
            # empty arrays can be overwritten by hashrefs -- per 995
            if ( $otyp eq 'ARRAY' && @$oref == 0 && ref( $ref ) eq 'HASH' ) {
                $ident_map{$key} = $ref;
            }
            else { 
                die "Uncertain merge for '$key'!";
            }
        }
        elsif ( $otyp eq 'HASH' ) {
            merge_hashes( $oref, $ref );
        }
        else {
            @$oref = sort { $a <=> $b || $a cmp $b } keys %{{ @$ref, @$oref }};
        }
    }
    else {
        $ident_map{$key} = $ref;
    }
}

# Copy only the keys that do not appear at higher levels to the 
# result hash
if ( my @keys = grep { !$levels{$_} } keys %ident_map ) { 
    @result{ @keys } = @ident_map{ @keys } if @keys;

}
# then step through the hash to make sure that the entries at
# all levels are equal to the identity
descend_restore( \%result, \%ident_map );

wow. thanks so much; sorry for the lack of code or clarification, I was trying to generate a tree, and did try Hash::Merge, but could not for the life of me resolve the coined-995 problem of replacing the empty 995 with the non-empty 995; this works beautifully and i really appreciate the help! (also tried the others and it either did the same thing as Hash::Merge, or it actually got rid of some branches) — Nick, May 05 '10 at 13:32

user157251 · Answer 2 · 2010-05-04T23:11:19.317

2

Use CPAN! Try Hash::Merge

# OO interface.  
my $merge = Hash::Merge->new( 'LEFT_PRECEDENT' );
my %c = %{ $merge->merge( \%a, \%b ) };

See CPAN for more info, it pretty much does everything you would want to, and is fully customizable.

edited May 04 '10 at 23:11

answered May 04 '10 at 19:56

user157251

64,489
38
208
350

score 1 · Answer 3 · answered May 04 '10 at 18:30

Give this recursive solution a try:

#   XXX: doesn't handle circular problems...
sub deepmerge {
    my (@structs) = @_;
    my $new;

    # filter out non-existant structs
    @structs = grep {defined($_)} @structs;

    my $ref = ref($structs[0]);
    if (not all(map {ref($_) eq $ref} @structs)) { 
        warn("deepmerge: all structs are not $ref\n");
    } 

    my @tomerge = grep {ref($_) eq $ref} @structs;
    return qr/$tomerge[0]/ if scalar(@tomerge) == 1 and $ref eq 'Regexp';
    return $tomerge[0] if scalar(@tomerge) == 1;

    if ($ref eq '') { 
        $new = pop(@tomerge); # prefer farthest right
    } 
    elsif ($ref eq 'Regexp') { 
        $new = qr/$tomerge[$#tomerge]/;
    } 
    elsif ($ref eq 'ARRAY') { 
        $new = [];
        for my $i (0 .. max(map {scalar(@$_) - 1} @tomerge)) { 
            $new->[$i] = deepmerge(map {$_->[$i]} @tomerge);
        }
    } 
    elsif ($ref eq 'HASH') { 
        $new = {};
        for my $key (uniq(map {keys %$_} @tomerge)) { 
            $new->{$key} = deepmerge(map {$_->{$key}} @tomerge);
        }
    }
    else {
        # ignore all other structures...
        $new = '';
    }

    return $new;
}

Modify it to your hearts content to achieve the desired result.

Upon further investigation, I noticed you're merging them in some different way than the above algorithm. Maybe just use this as an example then. Mine does this:

deepmerge({k => 'v'}, {k2 => 'v2'});
# returns {k => 'v', k2 => 'v2'}

And similar things for arrays.

Wow, I hope you work as hard for your employer as you do for random strangers on the internet. :) — Ether, May 04 '10 at 18:38
@Ether - may be he's just Jon Skeet's sock puppet account. (and i'm just Jon Skeet's bot... before you ask) — DVK, May 04 '10 at 18:44

Christoffer Hammarström · Answer 4 · 2010-05-04T17:41:41.773

I indented your wanted output as it was hard to read, for the benefit of other people who want to answer. I'm still thinking of an answer.

$VAR1 = { '999:' => [
                      { '992' => [ '905' ] },
                      { '993' => [
                                   { '909' => [] },
                                   { '904' => [] },
                                   { '902' => [] }
                                 ]
                      },
                      { '994' => [
                                   { '910' => [] },
                                   { '985' => [] },
                                   { '983' => [] }
                                 ]
                      },
                      { '995' => [
                                   { '101' => [] },
                                   { '102' => [] },
                                   { '103' => [] },
                                   { '104' => [] },
                                   { '105' => [] },
                                   { '106' => [] },
                                   { '107' => [] }
                                 ]
                      },
                      { '996' => [] },
                      { '997' => [ '986', '987', '990', '984', '989', '988' ] },
                      { '998' => [ '908', '906', '0', '998', '907' ] },
                      { '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ] }
                    ]
        };

I don't see the point of all those single entry hashes though, would not the following be better?

$VAR1 = { '999:' => {
                      '992' => [ '905' ],
                      '993' => {
                                 '909' => [],
                                 '904' => [],
                                 '902' => []
                               },
                      '994' => {
                                 '910' => [],
                                 '985' => [],
                                 '983' => []
                               },
                      '995' => {
                                 '101' => [],
                                 '102' => [],
                                 '103' => [],
                                 '104' => [],
                                 '105' => [],
                                 '106' => [],
                                 '107' => []
                               },
                      '996' => [],
                      '997' => [ '986', '987', '990', '984', '989', '988' ],
                      '998' => [ '908', '906', '0', '998', '907' ],
                      '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ]
                    }
        };

DVK · Answer 5 · 2010-05-04T18:39:24.060

0

Assuming the above data is in a file dump.txt, you can eval it piece by piece.

Updated code below

use strict;
use File::Slurp;
my $final_data = {}; 
my @data = map {eval $_} (read_file("dump.txt") =~ /\$VAR1 = ([^;]+);/gs);
foreach my $element (@data) {
    my $key = (keys %$element)[0]; 
    $final_data->{$key} ||= []; 
    push @{$final_data->{$key}}, $element->{$key}
}; 
use Data::Dumper; 
print Data::Dumper->Dump([$final_data]);

If you want to completely deep merge, you can at the end pass $final_data through this (not tested!!!) deep merger:

# Merge an array of hashes as follows:
# IN:  [ { 1 => 11 }, { 1 => 12 },{ 2 => 22 } ]
# OUT: { 1 => [ 11, 12 ], 2 => [ 22 ] }
# This is recursive - if array [11,12] was an array of hashrefs, we merge those too
sub merge_hashes {
    my $hashes = @_[0];
    return $hashes unless ref $hashes eq ref []; # Hat tip to brian d foy
    return $hashes unless grep { ref @_ eq ref {} } @$hashes; # Only merge array of hashes
    my $final_hashref = {};
    foreach my $element (@$hashes) {
        foreach my $key (keys %$element) {
            $final_hashref->{$key} ||= [];
            push @{ $final_hashref->{$key} }, $element->{$key};
        }
    }
    foreach my $key (keys %$final_hashref) {
        $final_hashref->{$key} = merge_hashes($final_hashref->{$key});
    }
    return $final_hashref;
}

edited May 04 '10 at 18:39

answered May 04 '10 at 18:05

DVK

119,765
29
201
317

NOTE: I am assuming the only merging happens at the topmist key. If not, it's a bit harder to do though not too hard... left as excercize for the reader for now :) – DVK May 04 '10 at 18:09
Well, it kind of looks as if he wants a tree out of the whole thing as well, with what would be "top-level" keys being merged with their matching second-level keys. – Axeman May 04 '10 at 18:18
@Axeman - OK... I don't think I could parse that comment at all... need more sleep :) Do you mean he wants to merge second-level keys as well? – DVK May 04 '10 at 18:22
Updated my original code to account for the fact that real life Data::Dumper output would not necessarily be nicely lined up on 1 line per call :) – DVK May 04 '10 at 18:24
@DVK: Sorry, I use shortcut verbs a lot nowadays. If you take the "wanted" result into a text editor and check for matching braces you'll see that the only top level key is '999' so it serves as the root of a tree-like structure. – Axeman May 04 '10 at 19:35

Greg Bacon · Answer 6 · 2010-05-04T18:54:59.887

Use push and autovivification.

Start with the usual front matter:

#! /usr/bin/perl

use warnings;
use strict;

Read your sample input from the DATA filehandle and create a datastructure similar to the one you dumped:

my @hashes;
while (<DATA>) {
  my $VAR1;
  $VAR1 = eval $_;
  die $@ if $@;
  push @hashes => $VAR1;
}

Your input has two cases:

A reference to an array that contains data to be merged with its cousins that have the same "key path."
Otherwise, it's a reference to a hash that contains a reference to an array from case 1 at some depth, so we strip off the outermost layer and keep digging.

Note the use of $_[0]. The semantics of Perl subroutines are such that the values in @_ are aliases rather than copies. This lets us call merge directly without having to first create a bunch of scaffolding to hold the merged contents. The code will break if you copy the value instead.

sub merge {
  my $data = shift;

  if (ref($data) eq "ARRAY") {
    push @{ $_[0] } => @$data;
  }
  else {
    foreach my $k (%$data) {
      merge($data->{$k} => $_[0]{$k});
    }
  }
}

Now we walk @hashes and incrementally merge their contents into %merged.

my %merged;    
foreach my $h (@hashes) {
  foreach my $k (keys %$h) {
    merge $h->{$k} => $merged{$k};
  }
}

We don't know in what order the values arrived, so run a final cleanup pass to sort the arrays:

sub sort_arrays {
  my($root) = @_;
  if (ref($root) eq "ARRAY") {
    @$root = sort { $a <=> $b } @$root;
  }
  else {
    sort_arrays($root->{$_}) for keys %$root;
  }
}

sort_arrays \%merged;

The Data::Dumper module is great for quick debugging!

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%merged;

Place a copy of the input from your question into the special DATA filehandle:

__DATA__
$VAR1 = { '999' => { '998' => [ '908', '906', '0', '998', '907' ] } };
$VAR1 = { '999' => { '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ] } };
$VAR1 = { '999' => { '996' => [] } };
$VAR1 = { '999' => { '995' => [] } };
$VAR1 = { '999' => { '994' => [] } };
$VAR1 = { '999' => { '993' => [] } };
$VAR1 = { '999' => { '997' => [ '986', '987', '990', '984', '989', '988' ] } };
$VAR1 = { '995' => { '101' => [] } };
$VAR1 = { '995' => { '102' => [] } };
$VAR1 = { '995' => { '103' => [] } };
$VAR1 = { '995' => { '104' => [] } };
$VAR1 = { '995' => { '105' => [] } };
$VAR1 = { '995' => { '106' => [] } };
$VAR1 = { '995' => { '107' => [] } };
$VAR1 = { '994' => { '910' => [] } };
$VAR1 = { '993' => { '909' => [] } };
$VAR1 = { '993' => { '904' => [] } };
$VAR1 = { '994' => { '985' => [] } };
$VAR1 = { '994' => { '983' => [] } };
$VAR1 = { '993' => { '902' => [] } };
$VAR1 = { '999' => { '992' => [ '905' ] } };

A sample of the output is below:

  '994' => {
    '910' => [],
    '985' => [],
    '983' => []
  },
  '999' => {
    '993' => [],
    '992' => [
      '905'
    ],
    '997' => [
      '984',
      '986',
      '987',
      '988',
      '989',
      '990'
    ],

score 0 · Answer 7 · answered May 05 '10 at 13:57

wow. thanks so much everyone (especially Axeman)! sorry for the lack of code or clarification, I was trying to generate a tree, and did try Hash::Merge, but could not for the life of me resolve the coined-995 problem of replacing the empty 995 with the non-empty 995; Axeman's solution works beautifully and I really appreciate the help/collaboration! (also tried the others and it either did the same thing as Hash::Merge, or it actually got rid of some branches).

some background on the input: had a set of hashes, each had keys (all same level) and two of which defined a) a parent to another, and b) itself (the rest were children), and so with a tree, i figured a hash was perfect, came up with a set of new hashes {a}->{b}->[c], and here we are...

again, thanks everyone and Axeman!

How can I merge several hashes into one hash in Perl?

7 Answers7

Linked