How to extract fields which can also contain their delimiters in awk

Question

I'm trying to extract field values from a text file which is formatted as follows:

{fieldvalue1} {fieldvalue2} {fieldvalue3}

However, the field values themselves can contain subfields that are themselves delimited with curly brackes, for example:

{abc} {xyz} {efg {123} {pqx}}

So in the above case the desired output is:

* fieldvalue1 = abc
* fieldvalue2 = xyz
* fieldvalue3 = efg {123} {pqx}

I tried the following filter:

sed 's/^{//g;s/}$//g' | awk -F"} {"

However this obviously failed to correctly parse fieldvalue3 above.

Why are people down-voting this very clear question? I realize there is ambiguity in it, but we don't need to just be dismissive of the effort to be clear, do we? And no, it isn't JSON. — Steve Harris, Mar 27 '17 at 23:03
Its not JSON. Its the output from a proprietary shell that needs to be parsed. The field names themselves can contain curly braces and they are delimited by curly braces. — user3855422, Mar 27 '17 at 23:03
Duplicate: http://stackoverflow.com/questions/546433/regular-expression-to-match-outer-brackets — Steve Harris, Mar 27 '17 at 23:08
you need a parser that understands the depth (nested structure) and support for recursion; implementing this in `awk` will be painful. — karakfa, Mar 27 '17 at 23:42
if it is upto one-level nesting as shown in sample, try `sed -E 's/ *\{(([^{}]*\{[^}]+\})+)\} *| *\{([^{}]+)\} */&\n/g'` or `perl -pe 's/ *\{(([^{}]|(?R))+)\} */$1\n/g'` — Sundeep, Mar 28 '17 at 05:20

Ed Morton · Answer 1 · 2017-03-28T04:50:38.670

You could just brute-force it by counting characters:

$ cat tst.awk
{
    numFlds = 0
    delete flds
    for (i=1; i<=length($0); i++) {
        char = substr($0,i,1)
        if ( (char == "{") && (++cnt == 1) ) {
            numFlds++
        }
        else if ( (char == "}") && (--cnt == 0) ) {
            # skip it
        }
        else if ( cnt != 0 ) {
            flds[numFlds] = flds[numFlds] char
        }
    }
    for (fldNr=1; fldNr<=numFlds; fldNr++) {
        print fldNr, flds[fldNr]
    }
}

$ awk -f tst.awk file
1 abc
2 xyz
3 efg {123} {pqx}

score 0 · Answer 2 · answered Mar 28 '17 at 07:50

Input looks like tcl list of the list :) Tcl handle this pretty well.

There is example read file in.txt line by line, and show field in desired output.

#!/bin/sh
# the next line restarts using expect \
    exec tclsh "$0" "$@"

# open file in.txt
set fd [open in.txt]

# loop till end of file
while {![eof $fd]} {
    # read line
    set line [gets $fd]

    set i 0
    # iterate over all elements
    foreach elm $line {
        incr i
        puts "* fieldvalue$i = $elm"
    }
}
close $fd

Or one-liner example handle one line of data. There used expect, because it's allow define tcl command in command line

 echo '{abc} {xyz} {efg {123} {pqx}}' | expect -c 'puts [join [lmap _ [gets stdin] {incr i; set _ "* fieldvalue$i = $_"}] \n]'

score 0 · Answer 3 · answered Mar 28 '17 at 09:25

Another quick awk:

#!/usr/bin/awk -f

{
    for(i=1;i<=NF;i++)
    {
        $i = e (e?FS:"") $i

        l = split($i,a,"{")
        r = split($i,a,"}")

        if(l == r)
        {
            print "* fieldvalue" ++c,$i
            e=""
        }
        else
            e = $i

    }
}

How to extract fields which can also contain their delimiters in awk

3 Answers3