10

According to Java API Scanner uses delimiters to break the whole input into tokens. I am trying to understand the tokens and delimiters. I was doing this program and hit a confusion

import java.util.Scanner;

public class Test {
    public static void main(String[] args) {
        Scanner s = null;
        try {
            s = new Scanner(System.in);
            s.useDelimiter("A");
            System.out.println("1 " + s.next().length());
            System.out.println("2 " + s.next().length());
            System.out.println("3 " + s.next().length());
            System.out.println("4 " + s.next().length());
        } finally {
            if (s != null) {
                s.close();
            }
        }
    }
}

When I use the input AAAAAasdf I get the following output.

1 0
2 0
3 0
4 0

I can understand this output as the length of tokens is zero between the delimiters hence all are zero but when I use the default delimiters and give the input as

_____aaa\n ->Replace underscore by space and \n by me hitting enter in eclipse console.

For this I am getting the output as

1 3

which I cannot understand. I have given 5 spaces so there should be 4 tokens of 0 lengths between them. Why not? What am I missing here?

Aseem Bansal
  • 6,125
  • 11
  • 43
  • 80

3 Answers3

7

useDelimiter takes a regular expression pattern. The default pattern is

private static Pattern WHITESPACE_PATTERN = Pattern.compile(
                                            "\\p{javaWhitespace}+");

Which will match any amount of contiguous whitespace. If you want the delimiter to match any amount of contiguous A try something like

s.useDelimiter("[A]+");

Read these: http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#useDelimiter(java.lang.String) http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#reset()

Taylor
  • 3,625
  • 1
  • 17
  • 31
  • Correct answer. Found it in API reference. Please add that also. http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#reset() – Aseem Bansal Nov 06 '13 at 18:02
  • If anyone is interested here is link to OpenJDK. Search for `WHITESPACE_PATTERN`and you'll see the statement given in this answer. http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/java/util/Scanner.java – Aseem Bansal Nov 06 '13 at 18:07
0

Its really interesting to see that when we specify " " (empty space) as a delimiter in the code

    try {
        s = new Scanner(System.in);
        s.useDelimiter(" ");
        System.out.println("1 " + s.next().length());
        System.out.println("2 " + s.next().length());
        System.out.println("3 " + s.next().length());
        System.out.println("4 " + s.next().length());
    } finally {
        if (s != null) {
            s.close();
        }
    }

and the input is

[5 spaces]asdf

we see the output

1 0
2 0
3 0
4 0

But when we dont specify the delimiter,

    try {
        s = new Scanner(System.in);
        //s.useDelimiter(" ");
        System.out.println("1 " + s.next().length());
        System.out.println("2 " + s.next().length());
        System.out.println("3 " + s.next().length());
        System.out.println("4 " + s.next().length());
    } finally {
        if (s != null) {
            s.close();
        }
    }

The same input

[5 spaces]asdf

generates a different output

1 4

So, i think specifying the delimiter, even though a default one makes the scanner skip all empty tokens.

Ankit Rustagi
  • 5,203
  • 10
  • 34
  • 65
0

Scanner.next() function Finds and returns the next complete token from this scanner. A complete token is preceded and followed by input that matches the delimiter pattern. The default pattern is \\p{javaWhitespace}+.

To understand it better, try etting delimiter "\\s*":

Scanner scanner = new Scanner(System.in);
scanner.useDelimiter("\\s*");
while(scanner.hasNext())
  System.out.println(scanner.next());

For an input 123, it scanner.next() will print:

1  // first println
2  //snd println
3 // third println

As X* says that pattern X can occur zero or more times. This expression is known as Quantifiers. However an expression X+ says that X, one or more times. So try using delimiter "[A]+" which says that "A" occurs one or more time, and matches any amount of contiguous "A"

Sage
  • 14,688
  • 3
  • 28
  • 34