0

I'm thinking we can look for an even number of quotes to the left, and to the right of the comma... but I'm not quite sure how to write it. Anyone know?

Actually..you'd just have to check either (left or right).

I want to split on this, so it has to match only the comma.

Example:

one, "two, three"

Should be split into two strings:

['one', ' "two, three"']
mpen
  • 237,624
  • 230
  • 766
  • 1,119
  • @tommieb75: No... I've been out of school for a year. Why is it so hard to believe this is a practical problem? – mpen Sep 27 '10 at 00:25
  • @tommie who cares? if it is then it's likely the solution to this problem isn't the solution to the entire homework problem, homework or not this is a valid - albeit begging for "what's the real problem?" - question – Mark Elliot Sep 27 '10 at 00:26
  • Mark: what's the real problem you're trying to solve? i.e. what are you trying to parse? – Mark Elliot Sep 27 '10 at 00:26
  • Duplicate 'Mark' detected. Aborting... – Ilia G Sep 27 '10 at 00:28
  • possible duplicate of [Java: splitting a comma-separated string but ignoring commas in quotes](http://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes) – mpen Sep 27 '10 at 00:41
  • Voting to close my own question... it's pretty similar to http://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes anyway (just found that) – mpen Sep 27 '10 at 00:42

4 Answers4

2

Are you parsing CSV? Regex is a pretty bad way to do it. Having read CSV definition (easily googlable) you can write an automaton to do it. Or... you can just steal one of many ready solutions out on web already.

Ilia G
  • 9,613
  • 2
  • 33
  • 57
1

Regex alone is not very good for determining nested conditions. Brace matching, quote matching etc, it just isn't really up to task. You can use a regex in combination with a loop to parse things, but on the other hand, it may be simpler to simply parse it yourself.

Maybe you could provide a few example strings to clarify what you need to match so I can answer better.

*edit: Looking at your proposed solution does it work with \\" where the \ is escaped, but not the "?

I suspect you'll find deficiencies in your regex if you're working with real world strings or complicated escape sequences. Likely this will not be the common case, but again, it is important to understand a regex is probably not what you actually want to do here. Regex has no concept of nested state, even for simple quotations escape sequences are hard to deal with correctly.

M2tM
  • 4,016
  • 31
  • 39
  • Quotes can't be nested, so I don't think this is asking too much from a regex. If you can suggest a better way to split a string by unquoted commas, I'm all ears. – mpen Sep 27 '10 at 00:28
  • "I'm thinking we can look for an even number of quotes to the left, and to the right of the comma... but I'm not quite sure how to write it." – M2tM Sep 27 '10 at 00:32
  • You don't need to match the quotes, nor count the quotes. You just need to determine if there's an even or odd number, which is perfectly doable. See my answer (I figured it out). – mpen Sep 27 '10 at 00:34
  • The way your question was originally treated (minus the example which was added after my comment) you made it sound like you needed to detect an even number of quotes on either side of a comma, that is a balancing matching scheme and all the regex implementations I am aware of don't really have any support for that kind of logic. – M2tM Sep 27 '10 at 00:34
  • Finally, alternating ', `, and " nesting is a commonly done thing in HTML/Javascript. So yes, those symbols can be nested and in your example you are nesting them as well. – M2tM Sep 27 '10 at 00:36
  • ' "two, three"' == nested comment symbols. – M2tM Sep 27 '10 at 00:37
  • @M2tM: I'm sorry if I gave that impression; I didn't mean that I needed to detect an even number of quotes, I meant that that could be a possible approach. And the different kinds of quotes were meant to demonstrate the output, not the input, so no, there wasn't any nesting in my example. However, you do raise a point -- I don't see why my users couldn't use different kinds of quotes. – mpen Sep 27 '10 at 00:48
  • I have re-read your post and I see I was mistaken in reading the second bit, I thought they were two examples of different representations that you needed to handle (when of course, the second is the stored representation.) – M2tM Sep 27 '10 at 00:51
  • Please read my edit on this post, dealing with escape sequences (if this is important) could be a potential difficulty of the approach you've chosen. I strongly recommend writing a simple parser if you're planning on allowing people other than yourself to interact with this system. I've dealt with a lot of import scripts and user data in my time working on websites and cms stuff, I guarantee you the first thing someone's going to do is try and store a " right in the middle of some place you don't account for it. Also awesome is using an existing CSV parser. – M2tM Sep 27 '10 at 00:53
  • Fair enough. You earned your +1. I'm not one that believes regexes are a one-size-fits-all solution anyway, but I thought this was simple enough that a regex could handle it. Isn't too much work to loop over the string and do some counting anyway. – mpen Sep 27 '10 at 01:30
0

Why not use the Split method of the string class?

String[] s = someString.Split(",");
// s[0] would contain the portion to the left of the comma
// s[1] would contain the portion to the right of the comma
t0mm13b
  • 32,846
  • 7
  • 71
  • 106
0

Nevermind... think I got it:

Regex("(?<=[^\"]*(?:\"[^\"]*\"[^\"])),");
mpen
  • 237,624
  • 230
  • 766
  • 1,119