I am using the answer of joker83 in this question: Regular expression for parsing CSV in PHP but I find it can't parse csv string whose field value containing comma correctly. Is it possible to refine this regexp to solve this problem?
Explanation of the pattern from joker83: /,(?=(?:[^\"])*(?![^\"]))/
.
1. ,(?=x)
means a comma that follows a pattern x.
2. [^\"]
means some character other than double quote.
3.(?:[^\"])
means match the parenthesis-ed subpattern but don't capture it into matched resulting array.
4. *
means 0 or more of the specified pattern.
5.(x)*
means 0 or more of the pattern x.
6. y?![^\"]
means a y that NOT follows some character not double quote(i.e. matching y that follows a dobule quote)
7. The whole meaning is matching a comma that follows a double quote (where * means zero ) or matching a comma that follows 1 or more of characters other than double quote and these characters follows a double quote.
As you can see, if the csv string is 120,"I love ""Lexi Belle"", ""Proxy Paige""","good stuff", then when apply this regexp in preg_split, we will get 4 fields (i.e. 120
"""I Love Lexi Bell""
""Proxy Piage"""
**"good stuff"**
)rather than the correct 3 fields.
Note: I'm using PHP5.2.6 (can't upgrade to new version since I spent a lot of time to install a oci8 that can read Oracle 8i on Windows. I can't install them correctly again in new version of PHP).
Note: I can't use fgetcsv() either since the input csv file contains LF code in csv string and fgetcsv() will split the newline in the middle of that field.