Turning multi-line string into single comma-separated

Question

Let's say I have the following string:

something1:    +12.0   (some unnecessary trailing data (this must go))
something2:    +15.5   (some more unnecessary trailing data)
something4:    +9.0   (some other unnecessary data)
something1:    +13.5  (blah blah blah)

How do I turn that into simply

+12.0,+15.5,+9.0,+13.5

in bash?

Let's step back for a moment and consider this thread a glaring indictment of bash as a programming language. Consider Scala's `listOfStuff mkString ", "`, or Haskell's `intercalate ", " listOfString` — F. P. Freely, Aug 03 '18 at 15:49
Related: [Convert text file into a comma delimited string](https://stackoverflow.com/q/53093449/6862601) — codeforester, Nov 02 '18 at 22:32

score 183 · Answer 1 · edited Oct 17 '18 at 13:05

183

Clean and simple:

awk '{print $2}' file.txt | paste -s -d, -

edited Oct 17 '18 at 13:05

kvantour

20,742
4
38
51

answered Jan 04 '12 at 03:06

Mattias Ahnberg

2,341
1
11
9

5

This is the best answer here, and obviously the correct way to do this – forresthopkinsa Sep 20 '16 at 04:03
How do I quote every values with single/double quote? – Hussain Dec 15 '16 at 09:05
2

@Hussain `cat thing | awk -F',' '{ print "'\''" $7 "'\' '" }' | paste -s -d ','` – starbeamrainbowlabs May 02 '17 at 10:41
How use `,'` as the delimiter? – Kasun Siyambalapitiya Nov 08 '19 at 09:39
Remember to handle *Windows newlines* (eg using `dos2unix`) if there are any CRLFs in the string. – Bowi Jul 02 '20 at 15:19

score 101 · Accepted Answer · answered Jan 03 '12 at 15:21

101

You can use awk and sed:

awk -vORS=, '{ print $2 }' file.txt | sed 's/,$/\n/'

Or if you want to use a pipe:

echo "data" | awk -vORS=, '{ print $2 }' | sed 's/,$/\n/'

To break it down:

awk is great at handling data broken down into fields
-vORS=, sets the "output record separator" to ,, which is what you wanted
{ print $2 } tells awk to print the second field for every record (line)
file.txt is your filename
sed just gets rid of the trailing , and turns it into a newline (if you want no newline, you can do s/,$//)

answered Jan 03 '12 at 15:21

Dan Fego

12,448
5
43
57

1

awk: invalid -v option :( – Marsellus Wallace Jun 15 '15 at 17:38
7

Add a space between -v and ORS=, (for me, on osx) – Graham P Heath Jun 25 '15 at 20:33
How to do the same command for getting pipe separated? `awk -v ORS=| '{ print $1 }' DCMC.rtf | sed 's/,$/\n/'` am getting an error – Yogesh Oct 04 '17 at 15:41
3

strangely, when I try to do this, the output is empty. – eternaltyro Apr 25 '19 at 07:07
1

I think for piped version it should be `{print $1}` otherwise I'm getting only commas in output – Przemysław Czechowski Jun 29 '20 at 11:56

score 22 · Answer 3 · answered Oct 29 '17 at 17:46

22

cat data.txt | xargs | sed -e 's/ /, /g'

answered Oct 29 '17 at 17:46

Bhargav Srinivasan

387
2
6

I like solutions like this too but is the -e arg necessary here since there's only the first command being used for sed? I believe `cat data.txt | xargs | sed 's/ /, /g'` would work all the same. For example, `echo -e "foo\nbar\nbazz" | xargs | sed 's/ /, /g'` outputs **foo, bar, bazz**. – John Pancoast Feb 04 '21 at 01:05

kev · Answer 4 · 2012-01-03T15:29:32.677

11

$ awk -v ORS=, '{print $2}' data.txt | sed 's/,$//'
+12.0,+15.5,+9.0,+13.5

$ cat data.txt | tr -s ' ' | cut -d ' ' -f 2 | tr '\n' ',' | sed 's/,$//'
+12.0,+15.5,+9.0,+13.5

edited Jan 03 '12 at 15:29

answered Jan 03 '12 at 15:18

kev

137,128
36
241
259

cheers, what about if the input to awk was through standard input (just put `function | awk...` in your example? – Alex Coplan Jan 03 '12 at 15:21

potong · Answer 5 · 2019-06-15T11:25:19.237

This might work for you:

cut -d' ' -f5 file | paste -d',' -s
+12.0,+15.5,+9.0,+13.5

or

sed '/^.*\(+[^ ]*\).*/{s//\1/;H};${x;s/\n/,/g;s/.//p};d' file
+12.0,+15.5,+9.0,+13.5

or

sed 's/\S\+\s\+//;s/\s.*//;H;$!d;x;s/.//;s/\n/,/g' file

For each line in the file; chop off the first field and spaces following, chop off the remainder of the line following the second field and append to the hold space. Delete all lines except the last where we swap to the hold space and after deleting the introduced newline at the start, convert all newlines to ,'s.

N.B. Could be written:

sed 's/\S\+\s\+//;s/\s.*//;1h;1!H;$!d;x;s/\n/,/g' file

score 10 · Answer 6 · answered Oct 29 '17 at 21:48

10

awk one liner

$ awk '{printf (NR>1?",":"") $2}' file

+12.0,+15.5,+9.0,+13.5

answered Oct 29 '17 at 21:48

Rahul Verma

2,776
11
26

Format specifier `"%s",` should be added after `printf` to make it more robust i.e. to make it work with all kind of rows such as "foo %s". – jarno Nov 30 '20 at 10:35

score 8 · Answer 7 · answered Jan 03 '12 at 15:35

8

This should work too

awk '{print $2}' file | sed ':a;{N;s/\n/,/};ba'

answered Jan 03 '12 at 15:35

jaypal singh

67,706
21
93
138

score 5 · Answer 8 · answered Sep 10 '19 at 11:16

5

Try this easy code:

awk '{printf("%s,",$2)}' File1

answered Sep 10 '19 at 11:16

Vonton

2,008
3
16
25

It adds an extra comma – jarno Nov 29 '20 at 17:51

score 4 · Answer 9 · answered Dec 17 '15 at 11:27

4

You can use grep:

grep -o "+\S\+" in.txt | tr '\n' ','

which finds the string starting with +, followed by any string \S\+, then convert new line characters into commas. This should be pretty quick for large files.

answered Dec 17 '15 at 11:27

kenorb

118,428
63
588
624

Aquarius Power · Answer 10 · 2013-05-12T20:28:42.293

3

try this:

sedSelectNumbers='s".* \(+[0-9]*[.][0-9]*\) .*"\1,"'
sedClearLastComma='s"\(.*\),$"\1"'
cat file.txt |sed "$sedSelectNumbers" |tr -d "\n" |sed "$sedClearLastComma"

the good thing is the easy part of deleting newline "\n" characters!

EDIT: another great way to join lines into a single line with sed is this: |sed ':a;N;$!ba;s/\n/ /g' got from here.

edited May 12 '13 at 20:28

answered Apr 19 '13 at 09:10

Aquarius Power

3,212
5
26
56

That EDIT is awesome - +1! – JoeG Aug 30 '13 at 13:34

Quatro por Quatro · Answer 11 · 2016-12-19T19:27:54.487

2

A solution written in pure Bash:

#!/bin/bash

sometext="something1:    +12.0   (some unnecessary trailing data (this must go))
something2:    +15.5   (some more unnecessary trailing data)
something4:    +9.0   (some other unnecessary data)
something1:    +13.5  (blah blah blah)"

a=()
while read -r a1 a2 a3; do
    # we can add some code here to check valid values or modify them
    a+=("${a2}")
done <<< "${sometext}"
# between parenthesis to modify IFS for the current statement only
(IFS=',' ; printf '%s: %s\n' "Result" "${a[*]}")

Result: +12.0,+15.5,+9.0,+13.5

edited Dec 19 '16 at 19:27

answered Dec 19 '16 at 19:21

Quatro por Quatro

41
3

Alternatively you could use `read -r -a cols` and thereafter add `"${cols[1]}` to the list `a`. – jarno Nov 30 '20 at 22:11

score 2 · Answer 12 · answered Oct 29 '17 at 20:16

2

Don't seen this simple solution with awk

awk 'b{b=b","}{b=b$2}END{print b}' infile

answered Oct 29 '17 at 20:16

ctac_

2,295
2
5
16

Chris Koknat · Answer 13 · 2015-09-15T18:48:00.303

0

Another Perl solution, similar to Dan Fego's awk:

perl -ane 'print "$F[1],"' file.txt | sed 's/,$/\n/'

-a tells perl to split the input line into the @F array, which is indexed starting at 0.

edited Sep 15 '15 at 18:48

answered Sep 15 '15 at 17:06

Chris Koknat

2,636
2
25
27

score 0 · Answer 14 · answered Aug 10 '18 at 09:04

Well the hardest part probably is selecting the second "column" since I wouldn't know of an easy way to treat multiple spaces as one. For the rest it's easy. Use bash substitutions.

# cat bla.txt
something1:    +12.0   (some unnecessary trailing data (this must go))
something2:    +15.5   (some more unnecessary trailing data)
something4:    +9.0   (some other unnecessary data)
something1:    +13.5  (blah blah blah)

# cat bla.sh
OLDIFS=$IFS
IFS=$'\n'
for i in $(cat bla.txt); do
  i=$(echo "$i" | awk '{print $2}')
  u="${u:+$u, }$i"
done
IFS=$OLDIFS
echo "$u"

# bash ./bla.sh
+12.0, +15.5, +9.0, +13.5

score 0 · Answer 15 · answered Nov 30 '20 at 22:31

0

Yet another AWK solution

Run

awk '{printf "%s", $c; while(getline){printf "%s%s", sep, $c}}' c=2 sep=','

to use the 2nd column to form the list separated by commas. Give the input as usual in standard input or as a file name argument.

answered Nov 30 '20 at 22:31

jarno

621
9
21

score 0 · Answer 16 · answered Jan 03 '12 at 15:25

With perl:

fg@erwin ~ $ perl -ne 'push @l, (split(/\s+/))[1]; END { print join(",", @l) . "\n" }' <<EOF
something1:    +12.0   (some unnecessary trailing data (this must go))
something2:    +15.5   (some more unnecessary trailing data)
something4:    +9.0   (some other unnecessary data)
something1:    +13.5  (blah blah blah)
EOF

+12.0,+15.5,+9.0,+13.5

score 0 · Answer 17 · answered Jan 03 '12 at 15:26

You can also do it with two sed calls:

$ cat file.txt 
something1:    +12.0   (some unnecessary trailing data (this must go))
something2:    +15.5   (some more unnecessary trailing data)
something4:    +9.0   (some other unnecessary data)
something1:    +13.5  (blah blah blah)
$ sed 's/^[^:]*: *\([+0-9.]\+\) .*/\1/' file.txt | sed -e :a -e '$!N; s/\n/,/; ta'
+12.0,+15.5,+9.0,+13.5

First sed call removes uninteresting data, and the second join all lines.

score 0 · Answer 18 · answered Jan 03 '12 at 18:38

You can also print like this:

Just awk: using printf

bash-3.2$ cat sample.log
something1:    +12.0   (some unnecessary trailing data (this must go))
something2:    +15.5   (some more unnecessary trailing data)
something4:    +9.0   (some other unnecessary data)
something1:    +13.5  (blah blah blah)

bash-3.2$ awk ' { if($2 != "") { if(NR==1) { printf $2 } else { printf "," $2 } } }' sample.log
+12.0,+15.5,+9.0,+13.5

Turning multi-line string into single comma-separated

18 Answers18

Linked

Related