1

I'm trying to join similar groups of lines in to a single line. My file is a basic log-type file, but each entry spans three lines followed by a newline. Example:

Timestamp
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4

Timestamp
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4

What I would like is for each block of 3 lines to be on a single, comma-separated line:

Timestamp,key1,val1,key2,val2,key3,val3,key4,val4

I could do this with sed&awk if I only had to deal with the key/value lines, but my problem is with getting the timestamp on each line.

Things I've looked at are using xargs and paste but neither seemed to do what I needed them to do.

codeforester
  • 28,846
  • 11
  • 78
  • 104
Ken S.
  • 133
  • 6

6 Answers6

4
$ awk -v RS= -F'\n| \\| ' -v OFS=',' '{$1=$1}1' file
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
Ed Morton
  • 157,421
  • 15
  • 62
  • 152
3

alternative solution with sed and paste

$ sed 's/ *| */,/g;/^$/d' file | paste -d, - - -

Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4

reads like: replace delimiter with comma, delete empty lines, paste 3 lines at a time with comma separator in between.

karakfa
  • 62,998
  • 7
  • 34
  • 47
  • This is the one I ended up using. I had to add a ";s/ //g" to it because there were some fields that had leading whitespace. I didn't know you could string sed commands together with a semicolon so thanks for that! – Ken S. May 23 '17 at 20:04
2

This might work for you (GNU sed):

sed -n 'N;N;s/ *[|\n] */,/pg;n' file

Read 3 lines into the pattern space, replace pipe or newline characters (possibly surrounded by spaces) with commas, print the successful substitution and throwaway the empty lines.

potong
  • 47,186
  • 6
  • 43
  • 72
1

This awk makes use of the builtin RS variable to simplify moving between records. We detect if we are on a timestamp line and set the ts variable if we are. Then since we set RS $1 through $NF will be our key, value fields, so iterate through them and append them to an output string. We save the last one for outside the loop so we can avoid a dangling ,. Then we just print the row and move on.

BEGIN{
    RS="\n\n";  # Everything between blank lines will be treated as one record
    FS="|";     # Our fields are separated with pipes.
}
{ 
    if( NF == 1 ){   # The number of fields on this line is 1... only our timestamp lines look like this.
        ts=$1;      
        next;       # Go to next record.
    };  



    # Build up an output buffer while avoiding dangling ","   
    out="";          

    for( i=1; i < NF; i++ ){
        out=out$i","
    } 

    out=out$NF; 

    print ts","out 
}
A Brothers
  • 546
  • 2
  • 9
  • You should mention your script is gawk-only due to multi-char RS (`RS="\n\n"`) Note, though that using it as you are is just a special case of `RS=""` which will work in any awk. – Ed Morton May 23 '17 at 16:51
  • Didn't know about the RS="". Also realized that this actually isn't the asked for output since it gives out a key1 line and key3 line per timestamp. – A Brothers May 23 '17 at 16:53
  • See https://www.gnu.org/software/gawk/manual/gawk.html#Multiple-Line - `an empty string as the value of RS indicates that records are separated by one or more blank lines...` and http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html - `If RS is null, then records are separated by sequences consisting of a plus one or more blank lines...` – Ed Morton May 23 '17 at 16:55
0

try:

awk '/^Timestamp/ && VAL{print VAL;VAL=$0;next} {gsub(/ +\| +/,",");VAL=VAL?VAL OFS $0:$0} END{print VAL}' OFS=","   Input_file

Looking for string Timestamp and VAL if they both have values then printing the value of variable VAL and then assigning the VAL to current line and mentioning the next to skip all further statements. Then if this condition is not satisfy then globally substituting the space | space with a comma, then making a variable named VAL whose value will be concatenating with it's own value each time. Then in END section also printing the value of VAL because VAL could be present there.

RavinderSingh13
  • 101,958
  • 9
  • 41
  • 77
0

Some stupid sed-only tricks:

sed -n -e '/Timestamp/{h;n};s/ | /,/g;H;/^$/{g;s/\n/,/g;s/,$//;p}' file

  • uses sed -n to print only when the p command is used
  • /Timestamp/{h;n}; replace the hold space the Timestamp line, and move onto the next line of input
  • s/ | /,/g;H; replace bars with commas and append to the hold space
  • /^$/{g;s/\n/,/g;s/,$//;p} on blank lines get the contents of the hold space into the pattern space, s/\n/,/g replace newlines with commas, and finally s/,$//;p remove the trailing comma and print the pattern space

Input file:

Timestampa
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4

Timestampb
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4

Output:

Timestampa,key1,val1,key2,val2,key3,val3,key4,val4
Timestampb,key1,val1,key2,val2,key3,val3,key4,val4

s/\n/,/g may be system / sed version dependent.

stevesliva
  • 4,685
  • 1
  • 14
  • 36