3

I don't seem to locate an SO question that matches this exact problem.

I have a text file that has one text token per line, without any commas, tabs, or quotes. I want to create a comma delimited string based on the file content.

Input:

one
two
three

Output:

one,two,three

I am using this command:

csv_string=$(tr '\n' ',' < file | sed 's/,$//')

Is there a more efficient way to do this?

codeforester
  • 28,846
  • 11
  • 78
  • 104
  • 3
    Mind you, you should have defined behavior in cases if you values already contain comma, double quotes or even newline characters inside. Then, invalid output will be produced from most of the answers provided below (code of which is oversimplified). What if next line in your sample will be `four (not "for")` (16 characters)? – miroxlav Nov 01 '18 at 00:11

5 Answers5

10

The usual command to do this is paste

csv_string=$(paste -sd, file.txt)
codeforester
  • 28,846
  • 11
  • 78
  • 104
rici
  • 201,785
  • 23
  • 193
  • 283
3

You can do it entirely with bash parameter expansion operators instead of using tr and sed.

csv_string=$(<file)               # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_string%,}        # remove trailing comma
codeforester
  • 28,846
  • 11
  • 78
  • 104
Barmar
  • 596,455
  • 48
  • 393
  • 495
3

One way with Awk would be to reset the RS and treat the records as separated by blank lines. This would handle words with spaces and format them in CSV format as expected.

awk '{$1=$1}1' FS='\n' OFS=',' RS= file

The {$1=$1} is a way to reconstruct the fields in each line($0) of the file based on modifications to Field (FS/OFS) and/or Record separators(RS/ORS). The trailing 1 is to print every line with the modifications done inside {..}.

Inian
  • 62,560
  • 7
  • 92
  • 110
1

With Perl one-liner:

$ cat csv_2_text
one
two
three
$ perl -ne '{ chomp; push(@lines,$_) } END { $x=join(",",@lines);  print "$x" }' csv_2_text
one,two,three

$ perl -ne ' { chomp; $_="$_," if not eof ;printf("%s",$_) } ' csv_2_text
one,two,three
$

From @codeforester

$ perl -ne 'BEGIN { my $delim = "" } { chomp; printf("%s%s", $delim, $_); $delim="," } END { printf("\n") }' csv_2_text
one,two,three
$
stack0114106
  • 7,676
  • 2
  • 10
  • 29
  • @codeforester.. pls consider up-voting the answer. Perl solutions are always portable and you can rely on them. – stack0114106 Nov 01 '18 at 17:42
  • Nitpick: can't we print the value inside the read loop itself, instead of having to store the values in an array? – codeforester Nov 01 '18 at 17:51
  • yes.. it can be done.. in that case, I'll have extra comma and I have to pipe it with sed for removing it.. just updated the answer – stack0114106 Nov 01 '18 at 17:54
  • 1
    I just found out the way.. it can be done with eof..updated the answer – stack0114106 Nov 01 '18 at 18:00
  • The moment you have to pipe it to another process like `sed`, it is no longer a Perl one-liner. How about this? `perl -ne 'BEGIN { my $delim = "" } { chomp; printf("%s%s", $delim, $_); $delim="," } END { printf("\n") }'`? – codeforester Nov 01 '18 at 18:03
1

Tested the four approaches on a Linux box - Bash only, paste, awk, Perl, as well as the tr | sed approach shown in the question:

#!/bin/bash

# generate test data
seq 1 10000 > test.file

times=${1:-50}

printf '%s\n' "Testing paste solution"
time {
    for ((i=0; i < times; i++)); do
      csv_string=$(paste -sd, test.file)
    done
}

printf -- '----\n%s\n' "Testing pure Bash solution"
time {
    for ((i=0; i < times; i++)); do
      csv_string=$(<test.file)          # read file into variable
      csv_string=${csv_string//$'\n'/,} # replace \n with ,
      csv_string=${csv_strings%,}       # remove trailing comma
    done
}

printf -- '----\n%s\n' "Testing Awk solution"
time {
    for ((i=0; i < times; i++)); do
      csv_string=$(awk '{$1=$1}1' FS='\n' OFS=',' RS= test.file)
    done
}

printf -- '----\n%s\n' "Testing Perl solution"
time {
    for ((i=0; i < times; i++)); do
      csv_string=$(perl -ne '{ chomp; $_="$_," if not eof; printf("%s",$_) }' test.file)
    done
}

printf -- '----\n%s\n' "Testing tr | sed solution"
time {
    for ((i=0; i < times; i++)); do
      csv_string=$(tr '\n' ',' < test.file | sed 's/,$//')
    done
}

Surprisingly, the Bash only solution does quite poorly. paste comes on top, followed by tr | sed, Awk, and perl:

Testing paste solution

real    0m0.109s
user    0m0.052s
sys 0m0.075s
----
Testing pure Bash solution

real    1m57.777s
user    1m57.113s
sys 0m0.341s
----
Testing Awk solution

real    0m0.221s
user    0m0.152s
sys 0m0.077s
----
Testing Perl solution

real    0m0.424s
user    0m0.388s
sys 0m0.080s
----
Testing tr | sed solution

real    0m0.162s
user    0m0.092s
sys 0m0.141s

For some reasons, csv_string=${csv_string//$'\n'/,} hangs on macOS Mojave running Bash 4.4.23.


Related posts:

codeforester
  • 28,846
  • 11
  • 78
  • 104