-1

I am using the answer of joker83 in this question: Regular expression for parsing CSV in PHP but I find it can't parse csv string whose field value containing comma correctly. Is it possible to refine this regexp to solve this problem?

Explanation of the pattern from joker83: /,(?=(?:[^\"])*(?![^\"]))/.
1. ,(?=x) means a comma that follows a pattern x.
2. [^\"] means some character other than double quote.
3.(?:[^\"]) means match the parenthesis-ed subpattern but don't capture it into matched resulting array.
4. * means 0 or more of the specified pattern.
5.(x)* means 0 or more of the pattern x.
6. y?![^\"] means a y that NOT follows some character not double quote(i.e. matching y that follows a dobule quote)
7. The whole meaning is matching a comma that follows a double quote (where * means zero ) or matching a comma that follows 1 or more of characters other than double quote and these characters follows a double quote.

As you can see, if the csv string is 120,"I love ""Lexi Belle"", ""Proxy Paige""","good stuff", then when apply this regexp in preg_split, we will get 4 fields (i.e. 120 """I Love Lexi Bell"" ""Proxy Piage""" **"good stuff"**)rather than the correct 3 fields.

Note: I'm using PHP5.2.6 (can't upgrade to new version since I spent a lot of time to install a oci8 that can read Oracle 8i on Windows. I can't install them correctly again in new version of PHP).
Note: I can't use fgetcsv() either since the input csv file contains LF code in csv string and fgetcsv() will split the newline in the middle of that field.

Community
  • 1
  • 1
Scott Chu
  • 874
  • 14
  • 25

2 Answers2

0

Why don't you use str_getcsv?

$string = '120,"I love Lexi Bell, Proxy Paige","good stuff"';
$parsedCsv = str_getcsv($string);
print_r($parsedCsv);
Mihai Matei
  • 22,929
  • 3
  • 29
  • 46
0

You can use this regex:

/,(?=([^\"]*\"[^\"]*\")*[^\"]*$)/

Which is found from this stackoverflow entry Java: splitting a comma-separated string but ignoring commas in quotes (but for java).

On your string it gives:

array(3) {
  [0]=>
  string(3) "120"
  [1]=>
  string(31) ""I love Lexi Bell, Proxy Paige""
  [2]=>
  string(12) ""good stuff""
}

Note that you still have the '"' on them.

Community
  • 1
  • 1
hexasoft
  • 657
  • 3
  • 7