How can I extract a string between matching braces in Perl?

Question

My input file is as below :

HEADER 
{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}

{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}

{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}

{ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}}

{ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}}

{
    ABC|*|XYZ:abc:pqr {GHI 0 68 0}
        {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
        }
TRAILER

I want to extract the file into an array as below :

$array[0] = "{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}"

$array[1] = "{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}"

$array[2] = "{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}"

..
..

$array[5] = "{
    ABC|*|XYZ:abc:pqr {GHI 0 68 0}
        {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
        }"

Which means, I need to match the first opening curly brace with its closing curly brace and extract the string in between.

I have checked the below link, but this doesnt apply to my question. Regex to get string between curly braces "{I want what's between the curly braces}"

I am trying but would really help if someone can assist me with their expertise ...

Thanks Sri ...

score 15 · Answer 1 · answered Apr 23 '10 at 17:35

15

Use Text::Balanced

answered Apr 23 '10 at 17:35

ysth

88,068
5
112
203

Thanks ysth, this is the best solution !! – Srilesh Apr 23 '10 at 18:30
@Srilesh: if you like this answer best, please click the outlined checkmark to the left of the answer. – Ether Apr 23 '10 at 18:34

Eric Strom · Accepted Answer · 2010-04-23T20:03:12.343

15

This can certainly be done with regex at least in modern versions of Perl:

my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;

print join "\n" => @array;

The regex matches a curly brace block that contains either non curly brace characters, or a recursion into itself (matches nested braces)

Edit: the above code works in Perl 5.10+, for earlier versions the recursion is a bit more verbose:

my $re; $re = qr/ \{ (?: [^{}]* | (??{$re}) )* \} /x;

my @array = $str =~ /$re/xg;

edited Apr 23 '10 at 20:03

answered Apr 23 '10 at 17:44

Eric Strom

38,995
2
75
150

Tried this, but I get the error Sequence (?0...) not recognized in regex; marked by – Srilesh Apr 23 '10 at 18:17
@Srilesh => the code I posted required perl 5.10+, i have edited my answer to include a version that will work in older perls. – Eric Strom Apr 23 '10 at 20:04
1

Solutions provided by @ysth, @Zaid, @leonbloy works fine for me, but @eric's solution has very good performance. I am applying the recursion on a 10MB file and the result is really fast compared to the others. Choosing your answer to be the best solution here. Thank you very much. – Srilesh Apr 23 '10 at 20:26

score 4 · Answer 3 · edited Jun 20 '20 at 09:12

I second ysth's suggestion to use the Text::Balanced module. A few lines will get you on your way.

use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;

my $file;
open my $fileHandle, '<', 'file.txt';

{ 
  local $/ = undef; # or use File::Slurp
  $file = <$fileHandle>;
}

close $fileHandle;

my @array = extract_multiple(
                               $file,
                               [ sub{extract_bracketed($_[0], '{}')},],
                               undef,
                               1
                            );

print $_,"\n" foreach @array;

OUTPUT

{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}
{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}
{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}
{ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}}
{ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}}
{
    ABC|*|XYZ:abc:pqr {GHI 0 68 0}
        {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
        }

Based on ysth's suggestion, i used Text::Balanced, but I was getting only the first match. Thanks for helping me here, I need to use the extract_multiple sub too. Thank you .. — Srilesh, Apr 23 '10 at 18:29

score 2 · Answer 4 · edited May 22 '17 at 12:41

I don't think pure regular expressions are what you want to use here (IMHO this might not even be parsable using regex).

Instead, build a small parser, similar to what's shown here: http://www.perlmonks.org/?node_id=308039 (see the answer by shotgunefx (Parson) on Nov 18, 2003 at 18:29 UTC)

UPDATE It seems it might be doable with a regex - I saw a reference to matching nested parentheses in Mastering Regular Expressions (that's available on Google Books and thus can be googled for if you don't have the book - see Chapter 5, section "Matching balanced sets of parentheses")

score 2 · Answer 5 · edited Apr 23 '10 at 21:07

You can always count braces:

my $depth = 0;
my $out = "";
my @list=();
foreach my $fr (split(/([{}])/,$data)) {
    $out .= $fr;
    if($fr eq '{') {
        $depth ++;
    }
    elsif($fr eq '}') {
        $depth --;
        if($depth ==0) {
            $out =~ s/^.*?({.*}).*$/$1/s; # trim
            push @list, $out;
            $out = "";
        }
    }
}
print join("\n==================\n",@list);

This is old, plain Perl style (and ugly, probably).

score 0 · Answer 6 · answered Apr 23 '10 at 17:29

0

You're much better off using a state machine than a regex for this type of parsing.

answered Apr 23 '10 at 17:29

Bob

771
5
10

score 0 · Answer 7 · edited Apr 30 '14 at 17:24

0

Regular expressions are actually pretty bad for matching braces. Depending how deep you want to go, you could write a full grammar (which is a lot easier than it sounds!) for Parse::RecDescent. Or, if you just want to get the blocks, search through for opening '{' marks and closing '}', and just keep count of how many are open at any given time.

edited Apr 30 '14 at 17:24

szabgab

5,884
9
45
61

answered Apr 23 '10 at 17:34

zigdon

13,427
6
32
54

Thanks zig, your response is very helpful. – Srilesh Apr 23 '10 at 18:32

How can I extract a string between matching braces in Perl?

7 Answers7

OUTPUT

Linked