How to reformat paragraph to have each sentence on a separate line?

Question

Input:

Hi. I am John.
My name is John. Who are you ?

Output:

Hi
I am John
My name is John
Who are you

I was tempted to start a bounty for this one. It almost sounds impossible to do only with a text or a howto. — stefan, Mar 26 '11 at 18:47

score 7 · Accepted Answer · edited May 23 '17 at 12:30

7

    String line = "Hi. My name is John. Who are you ?";
    String[] sentences = line.split("(?<=[.!?])\\s+");
    for (String sentence : sentences) {
       System.out.println("[" + sentence + "]");
    }

This produces:

[Hi.]
[My name is John.]
[Who are you ?]

    final String text =
        "Hi. I am John.\n" +
        "My name is John. Who are you ?";

    Iterable<String> sentences = new Iterable<String>() {
        @Override public Iterator<String> iterator() {
            return new Scanner(text).useDelimiter("\\s*[.!?]\\s*");
        }
    };

    for (String sentence : sentences) {
        System.out.println("[" + sentence + "]");
    }

This prints:

[Hi]
[I am John]
[My name is John]
[Who are you]

If this regex is still not what you want, then I recommend investing the time to educate yourself so you can take matters into your own hand.

@John: Then `split("[.!?]\\s+")`. Maybe even `split("\\s*[.!?]\\s+")`. Maybe even `split("\\s*[.!?]+\\s+")`. Feel free to clarify your unclear question to explain in more detail what is it that you want. More input, expected output, etc. – polygenelubricants Apr 30 '10 at 17:37
Hi... I am trying to take a file which contains a few lines of text.... and then split them up and store them in an array.... The problem that i am facing is that when i use a tokenizer it stops reading at each line.... and when i try to get in a while loop there.... it goes into an infinite loop..... – John Apr 30 '10 at 17:54
@John: I need to go to bed, but if you look up the API, it says that "`StringTokenizer` is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the `split` method of `String` or the `java.util.regex` package instead." – polygenelubricants Apr 30 '10 at 17:59
i've been reading bout split for the last couple of hours.... but all its done is confused me further – John Apr 30 '10 at 18:08
Thanks.... I'm still a bit confused on how line.split("(?<=[.!?])\\s+") works.... as in how does (?<=[.!?])\\s+") work ?? – John May 01 '10 at 03:27
It's a positive lookbehind (http://www.regular-expressions.info/lookaround.html). `(?<=[.!?])` looks behind the current position, and see if there's a match for `[.!?]`. See for example, http://stackoverflow.com/questions/2559759/how-do-i-convert-camelcase-into-human-readable-names-in-java – polygenelubricants May 01 '10 at 03:36
thanks a ton.... the problem wasnt with the tokenizer but my loop that was reading from the file..... thanks for the java lesson on split though.... ended up using tht only... cheers !!! – John May 02 '10 at 04:47

How to reformat paragraph to have each sentence on a separate line?

1 Answers1

See also

See also

See also

See also