In other words you want to split on
- one or more whitespaces
- place which has
=
after it and non-=
before it (like foo|=
where |
represents this place)
- place which has
=
before it it and non-=
after it (like =|foo
where |
represents this place)
In other words
s.useDelimiter("\\s+|(?<!=)(?==)|(?<==)(?!=)");
// ^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^
//cases: 1) 2) 3)
Since it looks like you are building parser I would suggest using tool which will let you build correct grammar like http://www.antlr.org/. But if you must stick with regex then other improvement which will let you build regex easier would be using Matcher#find
instead of delimiter from Scanner. This way your regex and code could look like
String data = "biscuit==cookie apple=fruit+-()";
String regex = "<=|==|>=|[\\Q<>+-=()\\E]|[^\\Q<>+-=()\\E]+";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(data);
while (m.find())
System.out.println(m.group());
Output:
biscuit
==
cookie apple
=
fruit
+
-
(
)
You can make this regex more general by using
String regex = "<=|==|>=|\\p{Punct}|\\P{Punct}+";
// ^^^^^^^^^^ ^^^^^^^^^^^-- standard cases
// ^^ ^^ ^^------------------------- special cases
Also this approach would require reading data from file first, and storing it in single String which you would parse. You can find many ways of how to read text from file for instance in this question:
Reading a plain text file in Java
so you can use something like
String data = new String(Files.readAllBytes(Paths.get("input.txt")));
You can specify encoding which String should use while reading bytes from file by using constructor String(bytes, encoding)
. So you can write it as new String(butes,"UTF-8")
or to avoid typos while selecting encoding use one of stored in StandardCharsets
class like new String(bytes, StandardCharsets.UTF_8)
.