1

Lets say I have the following string and I want the below output without requiring csv.

this, "what I need", to, do, "i, want, this", to, work

this
what i need
to
do
i, want, this
to
work
dnwilson
  • 147
  • 2
  • 10

1 Answers1

4

This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."

We can solve it with a beautifully-simple regex:

"([^"]+)"|[^, ]+

The left side of the alternation | matches complete "quotes" and captures the contents to Group1. The right side matches characters that are neither commas nor spaces, and we know they are the right ones because they were not matched by the expression on the left.

Option 2: Allowing Multiple Words

In your input, all tokens are single words, but if you also want the regex to work for my cat scratches, "what I need", your dog barks, use this:

"([^"]+)"|[^, ]+(?:[ ]*[^, ]+)*

The only difference is the addition of (?:[ ]*[^, ]+)* which optionally adds spaces + characters, zero or more times.

This program shows how to use the regex (see the results at the bottom of the online demo):

subject = 'this, "what I need", to, do, "i, want, this", to, work'
regex = /"([^"]+)"|[^, ]+/
# put Group 1 captures in an array
mymatches = []
subject.scan(regex) {|m|
     $1.nil? ? mymatches << $& : mymatches << $1
}
mymatches.each { |x| puts x }

Output

this
what I need
to
do
i, want, this
to
work

Reference

Community
  • 1
  • 1
zx81
  • 38,175
  • 8
  • 76
  • 97
  • Really awesome trick. I noticed that the current regex didn't take into account two or more words between commas. example string = 'hey you, "this", is, some, great stuff' so i modified the regex /[a-zA-Z\s*]+|"[^"]+"|[^, ]+/ – dnwilson Jun 28 '14 at 02:59
  • That was a deliberate feature based on your input, not a bug, but added a regex for you (option 2) :) – zx81 Jun 28 '14 at 03:07
  • Also FYI, tiny code improvement `$1.nil? ? mymatches << $& : mymatches << $1` – zx81 Jun 28 '14 at 04:22