1

Let's say I have the following sentence

Apples, "This, is, a test",409, James,46,90

I want to change the commas inside the quotation marks by ;. Or, alternatively, the ones outside the quotation marks by the same character ;. So far I thought of something like

perl -pe 's/(".*)\K,(?=.*")/;/g' <mystring>

However, this is only matching the last comma inside quotation marks because I am restarting the regex engine with \K. I also tried some regex's to change the commas outside quotation marks but I can't get it to work.

Note that the spaces after commas outside the quotation marks are there on purpose, so that

perl -pe 's/,\s/;/g' <mystring>

is not a valid answer.

The desired output would be

Apples, "This; is; a test",409, James,46,90

Or alternatively

Apples; "This, is, a test";409; James;46;90

Any thoughts on how to approach this problem?

CMB
  • 33
  • 5
  • use `Text::CSV` for csv-data. (Or any good version of it, like in Shawn's answer) – zdim Oct 29 '20 at 23:34
  • Here's a regex that matches commas only when outside quotes: https://stackoverflow.com/a/1757107/256196 – Bohemian Oct 30 '20 at 00:26

1 Answers1

5

I'd use an actual CSV parser instead of trying to hack something up with regular expressions. The very useful Text::AutoCSV module makes it easy to convert the comma field separators to semicolons in a one-liner:

$ echo 'Apples, "This, is, a test",409, James,46,90' |
    perl -MText::AutoCSV -e 'Text::AutoCSV->new(out_sep_char => ";")->write()'
Apples;"This, is, a test";409;James;46;90

For a non-perl solution, csvformat from csvkit is another handy tool, though it's harder to get the quoting the same:

$ echo 'Apples, "This, is, a test",409, James,46,90' |
    csvformat -S -U2 -D';'
"Apples";"This, is, a test";"409";"James";"46";"90"

Or (Self promotion alert!) my tawk utility (Which also won't get the quotes the same):

$ echo 'Apples, "This, is, a test",409, James,46,90' |
    tawk -csv -quoteall 'line { set F(1) $F(1); print }' OFS=";"
"Apples";" This, is, a test";"409";" James";"46";"90"
Shawn
  • 28,389
  • 3
  • 10
  • 37
  • (And by harder I mean it can't come close due to [a bug in csvkit](https://github.com/wireservice/csvkit/issues/938)) – Shawn Oct 30 '20 at 00:09