0

Possible Duplicate:
Split a string by spaces — preserving quoted substrings — in Python

Given the following string:

term1 term2 "the second term has spaces" term3 bad term 4

What regex will give me this list:

["term1", "term2", "the second term has spaces", "term3", "bad", "term", "4"]
Community
  • 1
  • 1
Trindaz
  • 14,751
  • 20
  • 74
  • 103
  • So, instead of quotation marks, this guy wanted to know brackets. However, regexs aren't really the tool here, due to possible recursion. See [this SO question](http://stackoverflow.com/questions/546433/regular-expression-to-match-outer-brackets), and [this SO question](http://stackoverflow.com/questions/524548/regular-expression-to-detect-semi-colon-terminated-c-for-while-loops/524624#524624) for some more information. However, if you're not concerned with the intricacies, and just want the scenario described, these might be a bit of overkill. – Patrick Perini Jul 20 '11 at 00:38
  • If you don't need a regex for this, I'd suggest just break it into pieces based on the quotation marks, then break the remaining part of the string up by space deliminators, then just put quotes around everything. Seems like that would be an easier way than using a regex anyway. – Nightfirecat Jul 20 '11 at 00:38
  • 3
    Seems this one has already been answered elsewhere: http://stackoverflow.com/questions/79968/split-a-string-by-spaces-preserving-quoted-substrings-in-python – Trindaz Jul 20 '11 at 00:40
  • @Nightfirecat That's not that simple of a method: `'"test test" test test "test test"'.split('"')` and `'test test "test test" test test'.split('"')` will both leave 3 unquoted "test test"s with some extra spaces and (in the first case) empty strings, so breaking up only the ones that were unquoted to begin with is not that simple. – agf Jul 20 '11 at 03:08

1 Answers1

1

For your simple example, this works fine:

import re
quotestring = 'term1 term2 "the second term has spaces" term3 bad term 4'
# uses a lookahead and lookbehind to check for quoted strings
stringlist = re.findall(r'((?<=\").+(?=\")|\w+)', quotestring)
print(stringlist) # works on Python 2 or 3

Or, from the linked post:

import shlex
quotestring = 'term1 term2 "the second term has spaces" term3 bad term 4'
stringlist = shlex.split(quotestring)
print(stringlist)
agf
  • 148,965
  • 36
  • 267
  • 227