57

Since String.split() works with regular expressions, this snippet:

String s = "str?str?argh";
s.split("r?");

... yields: [, s, t, , ?, s, t, , ?, a, , g, h]

What's the most elegant way to split this String on the r? sequence so that it produces [st, st, argh]?

EDIT: I know that I can escape the problematic ?. The trouble is I don't know the delimiter offhand and I don't feel like working this around by writing an escapeGenericRegex() function.

Konrad Garus
  • 50,165
  • 42
  • 145
  • 220
  • This is mentioned in the accepted answer of [How to split a string in Java - Stack Overflow](https://stackoverflow.com/q/3481828). Related questions: How to split string by [(space)](https://stackoverflow.com/q/7899525)/[(backslash)](https://stackoverflow.com/q/23751618)/[(newline)](https://stackoverflow.com/q/454908)/[(pipe)](https://stackoverflow.com/q/10796160)? ; [How to escape text for regular expression in Java](https://stackoverflow.com/questions/60160/how-to-escape-text-for-regular-expression-in-java) – user202729 Feb 02 '21 at 02:35

8 Answers8

88

A general solution using just Java SE APIs is:

String separator = ...
s.split(Pattern.quote(separator));

The quote method returns a regex that will match the argument string as a literal.

Stephen C
  • 632,615
  • 86
  • 730
  • 1,096
11

You can use

StringUtils.split("?r")

from commons-lang.

Tomasz Nurkiewicz
  • 311,858
  • 65
  • 665
  • 652
BastiS
  • 416
  • 3
  • 11
  • 3
    StringUtils.split() should be much faster than String.split() since StringUtils.split is using linear scanning for the separator, whereas String.split() is using regex, which is really slow – Michael P Feb 15 '17 at 19:11
  • 3
    Something to be aware of - according to the JavaDoc this treats adjacent separators as one separator. In my situation this was not desired – Tarmo Jun 29 '18 at 10:56
  • be aware that this accepts a list of _characters_ to split on, not a string. so this would split the string on instances of `?` or `r`, not instances of `r?` – Starwarswii May 13 '21 at 20:14
5

Escape the ?:

s.split("r\\?");
Etienne de Martel
  • 30,360
  • 7
  • 86
  • 102
5

This works perfect as well:

public static List<String> splitNonRegex(String input, String delim)
{
    List<String> l = new ArrayList<String>();
    int offset = 0;

    while (true)
    {
        int index = input.indexOf(delim, offset);
        if (index == -1)
        {
            l.add(input.substring(offset));
            return l;
        } else
        {
            l.add(input.substring(offset, index));
            offset = (index + delim.length());
        }
    }
}
Community
  • 1
  • 1
Martijn Courteaux
  • 63,780
  • 43
  • 187
  • 279
  • The performance of this solution is not ideal since it creates temporary substrings. – BladeCoder May 20 '14 at 09:30
  • 1
    @BladeCoder: You're right. I fixed it :) (When I wrote this, I must have been 16, I guess) – Martijn Courteaux May 20 '14 at 10:44
  • Much better indeed :) – BladeCoder May 20 '14 at 21:25
  • I have an app (and tests) where I split frequently, and I do not need a single split on a regular expression. And Android-Studio keeps kvetching about my regular expressions (which I do not need) are not efficiently pre-compiled patterns. I will use this, and not use it in the production code inside a loop. Thanks! – Phlip Feb 13 '21 at 00:22
3

Use Guava Splitter:

Extracts non-overlapping substrings from an input string, typically by recognizing appearances of a separator sequence. This separator can be specified as a single character, fixed string, regular expression or CharMatcher instance. Or, instead of using a separator at all, a splitter can extract adjacent substrings of a given fixed length.

Community
  • 1
  • 1
mindas
  • 25,644
  • 13
  • 93
  • 149
3
String[] strs = str.split(Pattern.quote("r?"));
贼小气
  • 39
  • 2
1

Using directly the Pattern class, is possible to define the expression as LITERAL, and in that case, the expression will be evaluated as is (not regex expression).

Pattern.compile(<literalExpression>, Pattern.LITERAL).split(<stringToBeSplitted>);

example:

String[] result = Pattern.compile("r?", Pattern.LITERAL).split("str?str?argh");

will result:

[st, st, argh]
Stephen C
  • 632,615
  • 86
  • 730
  • 1,096
Manuel Romeiro
  • 543
  • 5
  • 12
  • 3
    Your answer would be best if you explained your code. It will also be more useful to new users who search something similar in the future. – Nic3500 Jul 30 '18 at 11:46
  • I think that `Pattern.quote(...)` is a better solution. Certainly it is fewer characters :-) – Stephen C Sep 13 '18 at 00:03
  • There should be no difference in performance. They will do the same thing under the hood. – Stephen C Sep 17 '18 at 00:00
  • I have to agree with you. In theory, LITERAL should be more performant than evaluate the regex expression, but I done a little test with java 8, and for some inputs, LITERAL was best than QUOTE, but for others was the reverse. Conclusion: for now there is no relevant difference on performance. – Manuel Romeiro Sep 18 '18 at 01:25
-4

try

String s = "str?str?argh";
s.split("r\?");
Martijn Courteaux
  • 63,780
  • 43
  • 187
  • 279
Akash Agrawal
  • 4,357
  • 5
  • 25
  • 25