63

How do I split a String based on space but take quoted substrings as one word?

Example:

Location "Welcome  to india" Bangalore Channai "IT city"  Mysore

it should be stored in ArrayList as

Location
Welcome to india
Bangalore
Channai
IT city
Mysore
Kelly
  • 33,146
  • 3
  • 37
  • 49
user1000535
  • 887
  • 2
  • 8
  • 8
  • IMO even though the dupe-linked question is older, this answer by aioobe is the better of the two as it has the superior regex recipe for extracting matches. – kayleeFrye_onDeck Sep 14 '18 at 03:26

1 Answers1

139

Here's how:

String str = "Location \"Welcome  to india\" Bangalore " +
             "Channai \"IT city\"  Mysore";

List<String> list = new ArrayList<String>();
Matcher m = Pattern.compile("([^\"]\\S*|\".+?\")\\s*").matcher(str);
while (m.find())
    list.add(m.group(1)); // Add .replace("\"", "") to remove surrounding quotes.


System.out.println(list);

Output:

[Location, "Welcome  to india", Bangalore, Channai, "IT city", Mysore]

The regular expression simply says

  • [^"]     - token starting with something other than "
  • \S*       - followed by zero or more non-space characters
  • ...or...
  • ".+?"   - a "-symbol followed by whatever, until another ".
Mohamed Taher Alrefaie
  • 14,080
  • 5
  • 41
  • 62
aioobe
  • 383,660
  • 99
  • 774
  • 796
  • Location, "Welcome to india", Bangalore, Channai, "IT city", Mysore it is one String i'm entering in jsp form, after submitting it should be split as it is i mentioned in question – user1000535 Oct 18 '11 at 11:21
  • 1
    double quote should not be there – user1000535 Oct 18 '11 at 11:44
  • 4
    Ah, change `m.group(1)` to `m.group(1).replace("\"", "")`. – aioobe Oct 18 '11 at 11:48
  • `[^\"]` should be replaced with `[^\"\\s]` in case if you don't need spaces in the beginning of the string as part of the first element. – Tema Nov 19 '14 at 21:02
  • 2
    @Tema, that concern is orthogonal to the `"..."` grouping (and arguably application specific). I would strongly advice using `String.trim` when processing the string instead of making an already complex regexp even more complicated. – aioobe Nov 20 '14 at 10:43
  • If you reverse the alternation, you don't need to to make the check for "not starting with a quote" because the regex engine will prefer the thing on the left of the alternation first. [Here's an example](https://regex101.com/r/dA9hJ0/1) – 4castle Apr 23 '16 at 22:09
  • Any reason for ".+?" and not just ".*" ? – Andreas Lundgren Nov 22 '16 at 10:21
  • 1
    sadly it fails on inner quotes: 'exec -s something -e "lala \"lo lo\" lulu" -f $file'. it also wont work with single quotes. how to command line interpreters like bash parse such strings? – KIC Jan 22 '18 at 19:25
  • I got fail when string contain colon (:), example: "age:30 name:\"john doe\"". Output [age:30] [name:"john] [doe"] – windupurnomo May 04 '18 at 06:26
  • @windupurnomo, that has nothing to do with the `:` though. – aioobe May 04 '18 at 07:37