Java Performance issue for long length StringTokenizer

Question

I have a program that read and process data in a raw text String using StringTokenizer

Originally the StringTokenizer contains about 1,500 tokens and the program works fine. However the raw content increased and now it become about 12,000 tokens and the CPU consumption is largely increased.

I'm looking into the problem and try to identify the root cause. The program uses a while loop to check if there is any token left, and based on the token read, a different action would be taken. I'm checking those different actions to see if those action could be improved.

Meanwhile I would like to ask if handling one long length StringTokenizer would cost more CPU comparing to handling 10 short StringTokenizers.

Are you sure it's StringTokenizer and not what you're *doing* with it? Please show a short but complete program which demonstrates the problem. — Jon Skeet, Sep 14 '11 at 09:57
I don't think so. Strings are random-access, that should not slow down for long Strings. — Thilo, Sep 14 '11 at 09:58
There isn't anything in `StringTokenizer` that would blow up for long inputs. It has to be something in the surrounding code. — Barend, Sep 14 '11 at 10:00
This question is worthless without an [SSCCE](http://sscce.org). — Charles Goodwin, Sep 14 '11 at 10:18

score 1 · Answer 1 · edited May 23 '17 at 12:29

StringTokenizer usage is discouraged according to the StringTokenizer java doc. It is not deprecated though so its possible to use. only its not recommended. here is what is written:

"StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead."

Please check the following post. It has a very nice example of various ways to doing the same thing that you try to do.

performance-of-stringtokenizer-class-vs-split-method-in-java

you can try the samples provided there and see what works best for you.

Thanks A.J. Your recommended post is very helpful for solving my problem. — Terence Chan, Sep 19 '11 at 04:10

score 1 · Answer 2 · answered Sep 19 '11 at 04:06

First of all, thanks for your opinions. During last weekend I have run stress test with real data using a revised program and so happy that my problem is solved (Many thanks to A.J. ^_^ ). I would like to share my findings.

After studying the example mentioned by A.J., I have run some test program to read and process data using StringTokenizer and "indexOf" (Regex is even worst compared to StringTokenizer in my situation). My test program would count how many mini second is needed to process 24 messages (~12000 tokens each).

StringTokenizer need ~2700ms to complete, and "indexOf" only take ~210ms!

I've then revised my program like this (with minimum changes) and tested with real volume during last weekend:

Original program:

public class MsgProcessor {
    //Some other definition and methods ...

    public void processMessage (String msg) 
    {
        //...

        StringTokenizer token = new StringTokenizer(msg, FieldSeparator);
        while (token.hasMoreTokens()) {
            my_data = token.nextToken();
            // peformance different action base on token read
        }
    }
}

And here is updated program using "indexOf":

public class MsgProcessor {
    //Some other definition and methods ...
    private int tokenStart=0;
    private int tokenEnd=0;

    public void processMessage (String msg) 
    {
        //...
        tokenStart=0;
        tokenEnd=0;

        while (isReadingData) {
            my_data = getToken(msg);
            if (my_data == null)
                break;
            // peformance different action base on token read ...
        }
    }

    private String getToken (String msg)
    {
        String result = null;
        if ((tokenEnd = msg.indexOf(FieldSeparator, tokenStart)) >= 0) {
            result = msg.substring(tokenStart, tokenEnd);
            tokenStart = tokenEnd + 1;
        }
        return result;
    }
}

Please noticed that there is no "null" data in original tokens. If no FieldSeparator found, "getToken(msg)" will return null (as a signal for "no more token").

score 0 · Answer 3 · answered Sep 14 '11 at 10:41

0

Why don't you try the newer Scanner class instead? Scanners can be constructed using streams and files. Not sure it is more efficient than the old StringTokenizer, though.

answered Sep 14 '11 at 10:41

Mister Smith

24,695
17
97
181

Java Performance issue for long length StringTokenizer

3 Answers3