1

Someone just asked a question on String.split() and the solution was to use StringTokenizer. String split comma and parenthisis-JAVA Why doesn't String.split() split on parentheses?

public static void main(String[] args) {
   String a = "(id,created,employee(id,firstname," + 
         "employeeType(id), lastname),location)";
   StringTokenizer tok = new StringTokenizer(a, "(), ");
   System.out.println("StringTokenizer example");
   while (tok.hasMoreElements()) {
      String b = (String)tok.nextElement();
      System.out.println(b);
   }

  System.out.println("Split example");
  String[] array = a.split("(),");
  for (String ii: array) {
      System.out.println(ii);
  }
} 

Outputs:

StringTokenizer example
id
created
employee
id
firstname
employeeType
id
lastname
location
Split example
(id
created
employee(id
firstname
employeeType(id)
lastname)
location)

There was a discussion on String.split() vs. StringTokenizer at Scanner vs. StringTokenizer vs. String.Split but it doesn't explain the parentheses. Is this by design? What's going on here?

Community
  • 1
  • 1
older coder
  • 444
  • 1
  • 3
  • 12
  • `String.split` takes a *regular expression*. – Elliott Frisch Mar 30 '17 at 16:41
  • "()," IS a regular expression. Or does the parens need to be escaped? – older coder Mar 30 '17 at 16:42
  • Yes, but it doesn't match parentheses. It's a grouping operator with nothing inside it. Even if you escape the parens, it will then only match the exact character sequence `"(),"`. Try the regex `[(), ]` instead. That will match the characters `(`, `)`, `,`, and space. – Ted Hopp Mar 30 '17 at 16:43

2 Answers2

1

If you want split to split on the characters '(', ')', ',', and ' ', you need to pass a regex that matches any of those. The easiest is to use a character class:

String[] array = a.split("[(), ]");

Normally, parentheses in a regex are a grouping operator and would have to be escaped if you intended them to be used as literals. However, inside the character class delimiters, the parenthesis characters do not have to be escaped.

Ted Hopp
  • 222,293
  • 47
  • 371
  • 489
-2

StringTokenizer does not support regular expressions . The token characters "()," for the StringTokenizer are split , so the StringTokenizer code will split the input when it encounters any one of the following ( or ) or ,

String.split takes a regular expression and parenthesis are used to term different expressions. Since there is nothing in the parenthesis , they are ignored and only the comma , is used.

codemonkey
  • 600
  • 5
  • 10