0

I have a CSV file, that contains data exported from a mySQL table. In one of the fields, there is a newline character, which "splits" the field into 2 lines. I'm trying to remove this newline character, but can't seem to do it.

Also, the same field may contain double quotes and commas, which gives me trouble when I enclose the fields with " terminate them with , when I exported the table. So I used | to terminate the fields instead, and don't enclose the fields with anything.

When I cat the file on a linux machine, the field looks like this

13"\
58,20,"3

What the field is supposed to look like is

13"58,20,"3

When I used the vi "hex editor" (:%!xxd) to check the hex values of the line, I get

31 33 22 5c 0a 35 38 2c 32 30 2c 22 33

I tried using sed

sed -e 's/\\\n//'

and

sed -e 's/\x5c\x0a//'

to remove the newline, but they didn't work. How can I format the field to what it's supposed to look like?

Rayne
  • 12,369
  • 15
  • 35
  • 47

2 Answers2

4

Try:

$ sed '/\\$/{N; s/\\\n//}' file
13"58,20,"3

/\\$/ selects lines that end with \. For those lines, we read in the next line (command N) and then we do a substitution to remove the unwanted \ and newline: s/\\\n//.

Lines that do not end with \ are passed through unchanged.

This approach assumes that continued lines are continued just one time. If there were to be lines with two or more continuations, we would need a loop.

John1024
  • 97,609
  • 11
  • 105
  • 133
  • This works for the most part. But for some reason, there are a handful of lines where the line is still split after the first double quotes, with the same `\\\n` – Rayne Jul 04 '16 at 06:01
  • @Rayne OK. What's different about those lines? Is there whitespace after the `\` on those lines? Something else? – John1024 Jul 04 '16 at 06:04
  • No, they're the same, containing `\x5c\x0a`. – Rayne Jul 04 '16 at 06:21
  • BTW, the field with this newline problem isn't the last column, it is somewhere in the middle, so I used `sed '/\\/{N; s/\\\n//}'` instead. – Rayne Jul 04 '16 at 06:34
  • Can you show me an example of a series of lines where this code doesn't work? – John1024 Jul 04 '16 at 06:58
2

One option to handle this on the MySQL side would to use REPLACE() and remove the newline characters from the column (or columns) which contain them:

SELECT REPLACE(col, '\n', '')
FROM yourTable
INTO OUTFILE '/output.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263
  • This doesn't seem to be working for me. I included a `*` in `SELECT *, REPLACE(col, '\n', '')` as otherwise, it only returns the field. However, the field value is still split. – Rayne Jul 04 '16 at 06:04
  • Which fields do you want to export? List them all, along with the replaced one. – Tim Biegeleisen Jul 04 '16 at 06:08
  • In the same REPLACE statement? I tried `SELECT REPLACE(col1, col2, col3, col4, col5, '\n', '')` and it returned an error about syntax error at `col4, col5'. I don't think REPLACE expects more than 3 arguments. – Rayne Jul 04 '16 at 06:29
  • @Rayne No, `REPLACE` does one column at a time. – Tim Biegeleisen Jul 04 '16 at 06:30
  • I have 16 columns, I don't think having 16 REPLACE is the elegant way? – Rayne Jul 04 '16 at 06:31
  • You never mentioned this in your OP. Even so, you could use dynamic SQL and it would not be so painful. – Tim Biegeleisen Jul 04 '16 at 06:34