2

Input:

Hi. I am John.
My name is John. Who are you ?

Output:

Hi
I am John
My name is John
Who are you
Pavel Chuchuva
  • 21,289
  • 9
  • 93
  • 110
John
  • 3,883
  • 6
  • 23
  • 27
  • I was tempted to start a bounty for this one. It almost sounds impossible to do only with a text or a howto. – stefan Mar 26 '11 at 18:47

1 Answers1

7
    String line = "Hi. My name is John. Who are you ?";
    String[] sentences = line.split("(?<=[.!?])\\s+");
    for (String sentence : sentences) {
       System.out.println("[" + sentence + "]");
    }

This produces:

[Hi.]
[My name is John.]
[Who are you ?]

See also


If you're not comfortable using split (even though it's the recommended replacement for the "legacy" java.util.StringTokenizer), you can just use only java.util.Scanner (which is more than adequate to do the job).

See also

Here's a solution that uses Scanner, which by the way implements Iterator<String>. For extra instructional value, I'm also showing an example of using java.lang.Iterable<T> so that you can use the for-each construct.

    final String text =
        "Hi. I am John.\n" +
        "My name is John. Who are you ?";

    Iterable<String> sentences = new Iterable<String>() {
        @Override public Iterator<String> iterator() {
            return new Scanner(text).useDelimiter("\\s*[.!?]\\s*");
        }
    };

    for (String sentence : sentences) {
        System.out.println("[" + sentence + "]");
    }

This prints:

[Hi]
[I am John]
[My name is John]
[Who are you]

If this regex is still not what you want, then I recommend investing the time to educate yourself so you can take matters into your own hand.

See also


Note: the final modifier for the local variable text in the above snippet is a necessity. In an illustrative example, it makes for a concise code, but in your actual code you should refactor the anonymous class to its own named class and have it take text in the constructor.

See also

Community
  • 1
  • 1
polygenelubricants
  • 348,637
  • 121
  • 546
  • 611
  • @John: Then `split("[.!?]\\s+")`. Maybe even `split("\\s*[.!?]\\s+")`. Maybe even `split("\\s*[.!?]+\\s+")`. Feel free to clarify your unclear question to explain in more detail what is it that you want. More input, expected output, etc. – polygenelubricants Apr 30 '10 at 17:37
  • Hi... I am trying to take a file which contains a few lines of text.... and then split them up and store them in an array.... The problem that i am facing is that when i use a tokenizer it stops reading at each line.... and when i try to get in a while loop there.... it goes into an infinite loop..... – John Apr 30 '10 at 17:54
  • @John: I need to go to bed, but if you look up the API, it says that "`StringTokenizer` is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the `split` method of `String` or the `java.util.regex` package instead." – polygenelubricants Apr 30 '10 at 17:59
  • i've been reading bout split for the last couple of hours.... but all its done is confused me further – John Apr 30 '10 at 18:08
  • Thanks.... I'm still a bit confused on how line.split("(?<=[.!?])\\s+") works.... as in how does (?<=[.!?])\\s+") work ?? – John May 01 '10 at 03:27
  • It's a positive lookbehind (http://www.regular-expressions.info/lookaround.html). `(?<=[.!?])` looks behind the current position, and see if there's a match for `[.!?]`. See for example, http://stackoverflow.com/questions/2559759/how-do-i-convert-camelcase-into-human-readable-names-in-java – polygenelubricants May 01 '10 at 03:36
  • thanks a ton.... the problem wasnt with the tokenizer but my loop that was reading from the file..... thanks for the java lesson on split though.... ended up using tht only... cheers !!! – John May 02 '10 at 04:47