Splitting a string without using String.split() - and returning delimiters

Question

String s = "ab#cd#ef#gh#";
String regex = "#";

char [] sChar = s.toCharArray();
char [] regexChar = regex.toCharArray();

int count = 1;
for (int i = 0; i < regexChar.length; i++){
    for (int j = 0; j < sChar.length; j++){
        if (regexChar [i] == sChar[j]){
            count += 2;
        }
    }
}

String [] splitS = new String [count];
String temp;
for (int k = 0; k < count; k++){
    temp = "";
    for (int i = 0; i < regexChar.length; i++){
        for (int j = 0; j < sChar.length; j++){
            if (regexChar[i] == sChar[j]){
                temp = Character.toString(regexChar[i]);
            }
            else {
                temp = temp + Character.toString(sChar[j]);
            }
        }
    }
    splitS[k] = temp;
}

This is my code so far but it gives me splitS[#,#,#,#,#,#,#,#]. However it should be splitS[ab,#,cd,#,ef,#,gh,#]. Can anyone tell me why it's doing this?

@AleksG the StringTokenizer is obsolete, it is in the java language only for backwards compatibility, and if there are more than one delimeters. I've tried it and got it to work ith the StringTokenier however if the imput were to be a mix of # and ? and i want it split by both # or ? I wont know which one it split it by — MafiaBlood, Oct 01 '14 at 20:34
The regex can be a list of strings, it can be the letter a for example — MafiaBlood, Oct 01 '14 at 20:35
It should just split it by what the delimiter is however it should return the delimeter as well — MafiaBlood, Oct 01 '14 at 20:36
@MafiaBlood Can you state your source? There's nothing in the official [java 8 documentation](http://docs.oracle.com/javase/8/docs/api/java/util/StringTokenizer.html) indicating that it's obsolete. Also, have a look at [this question](http://stackoverflow.com/questions/5965767/performance-of-stringtokenizer-class-vs-split-method-in-java) — Aleks G, Oct 01 '14 at 20:36
@AleksG I was reading on here and thats what everyone else said — MafiaBlood, Oct 01 '14 at 20:38
@AleksG JavaDoc: "StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code." Interestingly, it's not marked as deprecated. — qqilihq, Oct 01 '14 at 20:40
@MafiaBlood When using StringTokenizer, you can tell it to return the delimiters as well as the separated substrings - you don't need to know which delimiter was used in each place. — Aleks G, Oct 01 '14 at 20:42
I've tried doing this: if (regexChar[j] != sChar[i]){ temp = temp + Character.toString(sChar[i]); } rather than the else statement but its still the same — MafiaBlood, Oct 01 '14 at 20:43
@MafiaBlood It's still unclear, how your separator would be treated. You call it `regex` in your code, however you apply no RegEx logic at all. Would that be a valid separator e.g., splitting at a, b OR c? `a|b|c`? — qqilihq, Oct 01 '14 at 20:43
Step through you code in a debugger, and watch the value of `temp`. Notice how as you loop through `sChar`, each time you find a `#` you reset `temp`? Notice how the last character in `sChar` is `#`, so `temp` is *always* `#` when the loop ends? — azurefrog, Oct 01 '14 at 20:44
@qqilihq I know i've called it regex even though it isn't a RegEx because it was confusing me when i was writing the code. The regex is just what you want the String s to be split by — MafiaBlood, Oct 01 '14 at 20:45
@azurefrog So if i change the for-loop too: for (int i = 0; i < sChar.length; i++){ for (int j = 0; j < regexChar.length; j++){ — MafiaBlood, Oct 01 '14 at 20:47
@MafiaBlood for the sake of completeness, StringTokenizer code would be: `StringTokenizer stk = new StringTokenizer("ab#cd#ef?gh?", "#?", true); List str = new LinkedList(); while(stk.hasMoreTokens()) str.add(stk.nextToken()); String[] splitS = str.toArray(new String[0]);` - looks much simpler than regex to me. — Aleks G, Oct 01 '14 at 20:48
Why don't you want to see String[] array = "ab#cd#ef#gh#".split("#") ? — SME_Dev, Oct 01 '14 at 20:52
@AleksG I feel stupid now, I had same code typed up, but I didn't have the true, why is that? — MafiaBlood, Oct 01 '14 at 20:52
@SME_Dev because this method is called split and my teacher says I can't use it — MafiaBlood, Oct 01 '14 at 20:53
@AleksG I looked back at my previous code and I was adding only the stuff that wasn't split, but then adding the regex after which was stupid on my part — MafiaBlood, Oct 01 '14 at 21:01

jedwards · Accepted Answer · 2014-10-01T20:58:06.657

I think the comments about using existing, standard classes should be seriously considered, but just for fun, what about:

import java.util.ArrayList;

class Splitter{
    static ArrayList<String> tokenize(String subject, String pattern)
    {
        ArrayList<String> tokens = new ArrayList<>();

        int tokenOff = 0;
        while(true)
        {
            int tokenPos = subject.indexOf(pattern, tokenOff);
            if(tokenPos == -1){ break; }
            String tok = subject.substring(tokenOff, tokenPos);

            addToken(tokens, tok);
            addToken(tokens, pattern);

            tokenOff = (tokenPos + pattern.length());
        }
        // Add any remaining characters
        addToken(tokens, subject.substring(tokenOff));

        return tokens;
    }

    static void addToken(ArrayList<String> list, String tok)
    {
        if(tok.length() > 0){ list.add(tok); }
    }

    public static void main(String args[])
    {
        String subject, pattern;
        ArrayList<String> tokens;

        subject = "ab#cd#ef#gh#"; 
        pattern = "#";
        tokens = tokenize(subject, pattern);
        System.out.println(tokens); // [ab, #, cd, #, ef, #, gh, #]

        subject = "ab##cd##ef##gh##"; 
        pattern = "##";
        tokens = tokenize(subject, pattern);
        System.out.println(tokens); // [ab, ##, cd, ##, ef, ##, gh, ##]

        subject = "ab##cd##ef##gh##ij"; 
        pattern = "##";
        tokens = tokenize(subject, pattern);
        System.out.println(tokens); // [ab, ##, cd, ##, ef, ##, gh, ##, ij]

        subject = "ab##cd#ef#gh##ij"; 
        pattern = "##";
        tokens = tokenize(subject, pattern);
        System.out.println(tokens); // [ab, ##, cd#ef#gh, ##, ij]
    }
}

In my opinion this is to big. He needs working logic. Giving pure OOO code with additional token filtering makes it little harder to understand to a newbie. — damienix, Oct 01 '14 at 21:14
@damienx, I don't know what you mean "too big". At least half of the code is the main function illustrating example cases. If "short" code were what you were looking for, certainly using a standard library would be preferred. And there's no object orientedness in here -- just two static functions. — jedwards, Oct 01 '14 at 21:17

damienix · Answer 2 · 2014-10-01T21:05:30.120

1

Does what u need. Got rid of all unneeded and unsafe crap.

public static String[] split(String s, String regex) {
    List<String> result = new ArrayList<>();
    int beginning = 0;
    for (int i = 0; i < s.length(); i++) {
        if (s.substring(i).startsWith(regex)) {
            // if you need splitter in output array
            if (!result.isEmpty()) {  
                result.add(regex);
            }
            result.add(s.substring(beginning, i));

            // Move forward for splitter size
            i += regex.length();
            beginning = i;
        }
    }

    // Optionally if you really need an array instead
    String[] splitS = new String[result.size()];
    result.toArray(splitS);
    return splitS;
}

edited Oct 01 '14 at 21:05

answered Oct 01 '14 at 20:53

damienix

5,293
1
20
29

Wow i didn't realize i was doing so much useless stuff. Thanks – MafiaBlood Oct 01 '14 at 20:58
In addition to, your version was vulnerable for different splitter sizes – damienix Oct 01 '14 at 21:01
My teacher doesn't test for that really, which I wish he did so it would make me better and help me in the real world where people break programs. I always try to make my code least vulnerable and I really appreciate the help and tips – MafiaBlood Oct 01 '14 at 21:07
1

Absolutely right: less fragile code is more peace in the future ;) it is often even more about code design and approach rather than complication. We removed most of the code and made it harder - that is ideal deal ;) *"less is more"* – damienix Oct 01 '14 at 21:11

Splitting a string without using String.split() - and returning delimiters

2 Answers2