-2

I have a large string that should be split at a certain character, if it is not preceded by another certain character.

Would is the most efficient way to do this?

An example: Split this string at ':', but not at "?:":

part1:part2:https?:example.com:anotherstring

What I have tried so far:

  1. Regex (?<!\?):. Very slow.

  2. First getting the indices where to split the string and then split it. Only efficient if there are not many split characters in the string.

  3. Iterating over the string character by character. Efficient if there are not many protect characters (e.g. '?').

dankito
  • 638
  • 3
  • 11
  • What about `split()` method? Please post what you have tried so far. Add also examples of your strings with explanation how should be splitted. – Boken May 19 '20 at 11:01
  • I didn't post for reason what I tried so far to not influence potential answers. But if you wanna know: 1. Regex - very slow. 2. First find the indices where to split the string then split it. Efficient in some cases but not in strings with many split characters. 3. Iterating string character by character. Efficient with many split characters but not if they are preceded by the protect character. An example would be split this at ';' but not at '?:': 1:2:https?:example.com:foo:bar – dankito May 19 '20 at 11:06
  • 1
    Does this answer your question? [Java split String performances](https://stackoverflow.com/questions/11001330/java-split-string-performances) – Boken May 19 '20 at 11:09
  • No as it doesn't take the protect character into account. – dankito May 19 '20 at 11:11
  • Please edit your question and add some of the pass and fail tests which you want to achieve along with your efforts. It will help us to help you –  May 19 '20 at 11:12
  • 1
    What means “large string” and what means “very slow”? A quick example with a 30 million character string, to be split into 3 million substrings using the simple regex method took half a second on my machine. Is that string “large”; is the needed time “slow”? Even more important, since all three approaches have an entirely different result, what are you actually needing or going to do with the result? There is no sense in doing something apparently faster when it produces a result that needs a much longer conversion afterwards. – Holger May 19 '20 at 15:16

2 Answers2

0

I fear you would have to go through the string and check if a ":" is preceded by a "?"

int lastIndex=0;
for(int index=string.indexOf(":"); index >= 0; index=string.indexOf(":", lastIndex)){
    if(index == 0 || string.charAt(index-1) != '?'){
        String splitString = string.subString(lastIndex, index);
        // add splitString to list or array
        lastIndex = index+1;
    }
}
// add string.subString(lastIndex) to list or array
0

You will have to test this very carefully (since I didn't do that), but using a regular expression in the split() might produce the results you want:

public static void main(String[] args) {
    String s = "Start.Teststring.Teststring1?.Teststring2.?Teststring3.?.End";
    String[] result = s.split("(?<!\\?)\\.(?!\\.)");
    System.out.println(String.join("|", result));
}

Output:

Start|Teststring|Teststring1?.Teststring2|?Teststring3|?.End

Note:
This only considers your example about splitting by dot if the dot is not preceded by an interrogation mark.

I don't think you will get a much more performant solution than the regex...

deHaar
  • 11,298
  • 10
  • 32
  • 38
  • I fear not. As said in the comment and in my edit, Regex has been by magnitude the slowest of all ways I tried. – dankito May 19 '20 at 11:23
  • @dankito Can you post your measurements? How have you measured the performance and what are the results? – deHaar May 19 '20 at 11:25