2

I'm trying to extract field values from a text file which is formatted as follows:

{fieldvalue1} {fieldvalue2} {fieldvalue3}

However, the field values themselves can contain subfields that are themselves delimited with curly brackes, for example:

{abc} {xyz} {efg {123} {pqx}}

So in the above case the desired output is:

* fieldvalue1 = abc
* fieldvalue2 = xyz
* fieldvalue3 = efg {123} {pqx}

I tried the following filter:

sed 's/^{//g;s/}$//g' | awk -F"} {"

However this obviously failed to correctly parse fieldvalue3 above.

Sundeep
  • 19,273
  • 2
  • 19
  • 42
user3855422
  • 219
  • 1
  • 7
  • Is this JSON? The answer is: don't – dawg Mar 27 '17 at 23:01
  • Why are people down-voting this very clear question? I realize there is ambiguity in it, but we don't need to just be dismissive of the effort to be clear, do we? And no, it isn't JSON. – Steve Harris Mar 27 '17 at 23:03
  • Its not JSON. Its the output from a proprietary shell that needs to be parsed. The field names themselves can contain curly braces and they are delimited by curly braces. – user3855422 Mar 27 '17 at 23:03
  • Duplicate: http://stackoverflow.com/questions/546433/regular-expression-to-match-outer-brackets – Steve Harris Mar 27 '17 at 23:08
  • you need a parser that understands the depth (nested structure) and support for recursion; implementing this in `awk` will be painful. – karakfa Mar 27 '17 at 23:42
  • if it is upto one-level nesting as shown in sample, try `sed -E 's/ *\{(([^{}]*\{[^}]+\})+)\} *| *\{([^{}]+)\} */&\n/g'` or `perl -pe 's/ *\{(([^{}]|(?R))+)\} */$1\n/g'` – Sundeep Mar 28 '17 at 05:20

3 Answers3

0

You could just brute-force it by counting characters:

$ cat tst.awk
{
    numFlds = 0
    delete flds
    for (i=1; i<=length($0); i++) {
        char = substr($0,i,1)
        if ( (char == "{") && (++cnt == 1) ) {
            numFlds++
        }
        else if ( (char == "}") && (--cnt == 0) ) {
            # skip it
        }
        else if ( cnt != 0 ) {
            flds[numFlds] = flds[numFlds] char
        }
    }
    for (fldNr=1; fldNr<=numFlds; fldNr++) {
        print fldNr, flds[fldNr]
    }
}

$ awk -f tst.awk file
1 abc
2 xyz
3 efg {123} {pqx}
Ed Morton
  • 157,421
  • 15
  • 62
  • 152
0

Input looks like tcl list of the list :) Tcl handle this pretty well.

There is example read file in.txt line by line, and show field in desired output.

#!/bin/sh
# the next line restarts using expect \
    exec tclsh "$0" "$@"

# open file in.txt
set fd [open in.txt]

# loop till end of file
while {![eof $fd]} {
    # read line
    set line [gets $fd]

    set i 0
    # iterate over all elements
    foreach elm $line {
        incr i
        puts "* fieldvalue$i = $elm"
    }
}
close $fd

Or one-liner example handle one line of data. There used expect, because it's allow define tcl command in command line

 echo '{abc} {xyz} {efg {123} {pqx}}' | expect -c 'puts [join [lmap _ [gets stdin] {incr i; set _ "* fieldvalue$i = $_"}] \n]'
komar
  • 841
  • 5
  • 8
0

Another quick awk:

#!/usr/bin/awk -f

{
    for(i=1;i<=NF;i++)
    {
        $i = e (e?FS:"") $i

        l = split($i,a,"{")
        r = split($i,a,"}")

        if(l == r)
        {
            print "* fieldvalue" ++c,$i
            e=""
        }
        else
            e = $i

    }
}
grail
  • 896
  • 6
  • 13