Removing characters from a CSV file

Question

I have a CSV file, that contains data exported from a mySQL table. In one of the fields, there is a newline character, which "splits" the field into 2 lines. I'm trying to remove this newline character, but can't seem to do it.

Also, the same field may contain double quotes and commas, which gives me trouble when I enclose the fields with " terminate them with , when I exported the table. So I used | to terminate the fields instead, and don't enclose the fields with anything.

When I cat the file on a linux machine, the field looks like this

13"\
58,20,"3

What the field is supposed to look like is

13"58,20,"3

When I used the vi "hex editor" (:%!xxd) to check the hex values of the line, I get

31 33 22 5c 0a 35 38 2c 32 30 2c 22 33

I tried using sed

sed -e 's/\\\n//'

and

sed -e 's/\x5c\x0a//'

to remove the newline, but they didn't work. How can I format the field to what it's supposed to look like?

These seems like it might be relevant: http://stackoverflow.com/questions/356578/how-to-output-mysql-query-results-in-csv-format — Tim Biegeleisen, Jul 04 '16 at 03:20
In mysql, the field looks like this `13"58,20,"3`. The just means that the `58,20,"3` string appears on the next line. — Rayne, Jul 04 '16 at 03:25

John1024 · Answer 1 · 2016-07-04T03:28:50.630

4

Try:

$ sed '/\\$/{N; s/\\\n//}' file
13"58,20,"3

/\\$/ selects lines that end with \. For those lines, we read in the next line (command N) and then we do a substitution to remove the unwanted \ and newline: s/\\\n//.

Lines that do not end with \ are passed through unchanged.

This approach assumes that continued lines are continued just one time. If there were to be lines with two or more continuations, we would need a loop.

edited Jul 04 '16 at 03:28

answered Jul 04 '16 at 03:23

John1024

97,609
11
105
133

This works for the most part. But for some reason, there are a handful of lines where the line is still split after the first double quotes, with the same `\\\n` – Rayne Jul 04 '16 at 06:01
@Rayne OK. What's different about those lines? Is there whitespace after the `\` on those lines? Something else? – John1024 Jul 04 '16 at 06:04
No, they're the same, containing `\x5c\x0a`. – Rayne Jul 04 '16 at 06:21
BTW, the field with this newline problem isn't the last column, it is somewhere in the middle, so I used `sed '/\\/{N; s/\\\n//}'` instead. – Rayne Jul 04 '16 at 06:34
Can you show me an example of a series of lines where this code doesn't work? – John1024 Jul 04 '16 at 06:58

score 2 · Answer 2 · answered Jul 04 '16 at 03:29

2

One option to handle this on the MySQL side would to use REPLACE() and remove the newline characters from the column (or columns) which contain them:

SELECT REPLACE(col, '\n', '')
FROM yourTable
INTO OUTFILE '/output.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';

answered Jul 04 '16 at 03:29

Tim Biegeleisen

387,723
20
200
263

This doesn't seem to be working for me. I included a `*` in `SELECT *, REPLACE(col, '\n', '')` as otherwise, it only returns the field. However, the field value is still split. – Rayne Jul 04 '16 at 06:04
Which fields do you want to export? List them all, along with the replaced one. – Tim Biegeleisen Jul 04 '16 at 06:08
In the same REPLACE statement? I tried `SELECT REPLACE(col1, col2, col3, col4, col5, '\n', '')` and it returned an error about syntax error at `col4, col5'. I don't think REPLACE expects more than 3 arguments. – Rayne Jul 04 '16 at 06:29
@Rayne No, `REPLACE` does one column at a time. – Tim Biegeleisen Jul 04 '16 at 06:30
I have 16 columns, I don't think having 16 REPLACE is the elegant way? – Rayne Jul 04 '16 at 06:31
You never mentioned this in your OP. Even so, you could use dynamic SQL and it would not be so painful. – Tim Biegeleisen Jul 04 '16 at 06:34

Removing characters from a CSV file

2 Answers2