10

How do you set the delimiter for a scanner to either ; or new line?

I tried: Scanner.useDelimiter(Pattern.compile("(\n)|;")); But it doesn't work.

nhahtdh
  • 52,949
  • 15
  • 113
  • 149
Razvi
  • 2,578
  • 5
  • 27
  • 38
  • Found the bug, i have to use (\r\n)|;. I was parsing something like this: string;number\r\n... and it didn't take something like 100\r as a number. – Razvi Dec 30 '09 at 18:03

3 Answers3

16

As a general rule, in patterns, you need to double the \.

So, try

Scanner.useDelimiter(Pattern.compile("(\\n)|;"));`

or

Scanner.useDelimiter(Pattern.compile("[\\n;]"));`

Edit: If \r\n is the problem, you might want to try this:

Scanner.useDelimiter(Pattern.compile("[\\r\\n;]+"));

which matches one or more of \r, \n, and ;.

Note: I haven't tried these.

Naman
  • 23,555
  • 22
  • 173
  • 290
Powerlord
  • 82,184
  • 16
  • 119
  • 164
  • 1
    You can go either way. If you use two backslashes, the regex compiler sees `\n` and interprets it as the escape sequence for a linefeed. If you use one backslash, the regex compiler sees an actual linefeed character, which it matches literally. But I would definitely go with the character-class version: `"[\\n;]"` or `"[\n;]"`; it's easier to read as well as more efficient. – Alan Moore Dec 30 '09 at 19:08
  • @Alan Moore: Ah, OK... I just assumed that a literal line break would be misinterpreted. – Powerlord Dec 30 '09 at 19:40
9

As you've discovered, you needed to look for DOS/network style \r\n (CRLF) line separators instead of the Unix style \n (LF only). But what if the text contains both? That happens a lot; in fact, when I view the source of this very page I see both varieties.

You should get in the habit of looking for both kinds of separator, as well as the older Mac style \r (CR only). Here's one way to do that:

\r?\n|\r

Plugging that into your sample code you get:

scanner.useDelimiter(";|\r?\n|\r");

This is assuming you want to match exactly one newline or semicolon at a time. If you want to match one or more you can do this instead:

scanner.useDelimiter("[;\r\n]+");

Notice, too, how I passed in a regex string instead of a Pattern; all regexes get cached automatically, so pre-compiling the regex doesn't get you any performance gain.

Alan Moore
  • 68,531
  • 11
  • 88
  • 149
1

Looking at the OP's comment, it looks like it was a different line ending (\r\n or CRLF) that was the problem.

Here's my answer, which would handle multiple semicolons and line endings in either format (may or may not be desired)

Scanner.useDelimiter(Pattern.compile("([\n;]|(\r\n))+"));

e.g. an input file that looks like this:

1


2;3;;4
5

would result in 1,2,3,4,5

I tried normal \n and \\n - both worked in my case, though I agree if you need a normal backslash you would want to double it as it is an escape character. It just so happens that in this case, "\n" becomes the desired character with or without the extra '\'

Joshua McKinnon
  • 23,676
  • 10
  • 53
  • 61