How to merge every n lines in a file

Question

I'm trying to join similar groups of lines in to a single line. My file is a basic log-type file, but each entry spans three lines followed by a newline. Example:

Timestamp
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4

Timestamp
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4

What I would like is for each block of 3 lines to be on a single, comma-separated line:

Timestamp,key1,val1,key2,val2,key3,val3,key4,val4

I could do this with sed&awk if I only had to deal with the key/value lines, but my problem is with getting the timestamp on each line.

Things I've looked at are using xargs and paste but neither seemed to do what I needed them to do.

Possible duplicate of [Remove line break every nth line using sed](https://stackoverflow.com/questions/32085045/remove-line-break-every-nth-line-using-sed) — codeforester, May 23 '17 at 16:22
Should `Timestamp` be the same across the entries? Show the actual input — RomanPerekhrest, May 23 '17 at 16:22

score 4 · Answer 1 · answered May 23 '17 at 16:46

4

$ awk -v RS= -F'\n| \\| ' -v OFS=',' '{$1=$1}1' file
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4

answered May 23 '17 at 16:46

Ed Morton

157,421
15
62
152

score 3 · Accepted Answer · answered May 23 '17 at 16:50

3

alternative solution with sed and paste

$ sed 's/ *| */,/g;/^$/d' file | paste -d, - - -

Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4

reads like: replace delimiter with comma, delete empty lines, paste 3 lines at a time with comma separator in between.

answered May 23 '17 at 16:50

karakfa

62,998
7
34
47

This is the one I ended up using. I had to add a ";s/ //g" to it because there were some fields that had leading whitespace. I didn't know you could string sed commands together with a semicolon so thanks for that! – Ken S. May 23 '17 at 20:04

score 2 · Answer 3 · answered May 23 '17 at 18:55

This might work for you (GNU sed):

sed -n 'N;N;s/ *[|\n] */,/pg;n' file

Read 3 lines into the pattern space, replace pipe or newline characters (possibly surrounded by spaces) with commas, print the successful substitution and throwaway the empty lines.

score 1 · Answer 4 · answered May 23 '17 at 16:49

1

This awk makes use of the builtin RS variable to simplify moving between records. We detect if we are on a timestamp line and set the ts variable if we are. Then since we set RS $1 through $NF will be our key, value fields, so iterate through them and append them to an output string. We save the last one for outside the loop so we can avoid a dangling ,. Then we just print the row and move on.

BEGIN{
    RS="\n\n";  # Everything between blank lines will be treated as one record
    FS="|";     # Our fields are separated with pipes.
}
{ 
    if( NF == 1 ){   # The number of fields on this line is 1... only our timestamp lines look like this.
        ts=$1;      
        next;       # Go to next record.
    };  



    # Build up an output buffer while avoiding dangling ","   
    out="";          

    for( i=1; i < NF; i++ ){
        out=out$i","
    } 

    out=out$NF; 

    print ts","out 
}

answered May 23 '17 at 16:49

A Brothers

546
2
9

You should mention your script is gawk-only due to multi-char RS (`RS="\n\n"`) Note, though that using it as you are is just a special case of `RS=""` which will work in any awk. – Ed Morton May 23 '17 at 16:51
Didn't know about the RS="". Also realized that this actually isn't the asked for output since it gives out a key1 line and key3 line per timestamp. – A Brothers May 23 '17 at 16:53
See https://www.gnu.org/software/gawk/manual/gawk.html#Multiple-Line - `an empty string as the value of RS indicates that records are separated by one or more blank lines...` and http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html - `If RS is null, then records are separated by sequences consisting of a plus one or more blank lines...` – Ed Morton May 23 '17 at 16:55

score 0 · Answer 5 · answered May 23 '17 at 16:22

try:

awk '/^Timestamp/ && VAL{print VAL;VAL=$0;next} {gsub(/ +\| +/,",");VAL=VAL?VAL OFS $0:$0} END{print VAL}' OFS=","   Input_file

Looking for string Timestamp and VAL if they both have values then printing the value of variable VAL and then assigning the VAL to current line and mentioning the next to skip all further statements. Then if this condition is not satisfy then globally substituting the space | space with a comma, then making a variable named VAL whose value will be concatenating with it's own value each time. Then in END section also printing the value of VAL because VAL could be present there.

stevesliva · Answer 6 · 2017-05-23T17:50:24.423

Some stupid sed-only tricks:

sed -n -e '/Timestamp/{h;n};s/ | /,/g;H;/^$/{g;s/\n/,/g;s/,$//;p}' file

uses sed -n to print only when the p command is used
/Timestamp/{h;n}; replace the hold space the Timestamp line, and move onto the next line of input
s/ | /,/g;H; replace bars with commas and append to the hold space
/^$/{g;s/\n/,/g;s/,$//;p} on blank lines get the contents of the hold space into the pattern space, s/\n/,/g replace newlines with commas, and finally s/,$//;p remove the trailing comma and print the pattern space

Input file:

Timestampa
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4

Timestampb
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4

Output:

Timestampa,key1,val1,key2,val2,key3,val3,key4,val4
Timestampb,key1,val1,key2,val2,key3,val3,key4,val4

s/\n/,/g may be system / sed version dependent.

How to merge every n lines in a file

6 Answers6