1

If I have a string like

foo&bar&baz

and I want to extract tokens foo, bar, baz from it. a regex is easy to write

val regex = "([^&]+)".r
regex.findAllIn("foo&bar&baz").map(_.toString).toList

This gives me the answer I want.

List("foo", "bar", "baz")

But the input can have the & symbol escaped with _&_

So if the input is

foo_&_bar&baz

The output should be foo&bar, baz.

I googled and found this thread which has similar problem

RegEx disallow a character unless escaped

Based on this thread I changed my regex to

val regex = "((?:_&_|[^&]*)+)".r

But this doesn't work the output is

List("foo_", "", "_bar", "", "baz", "")
Knows Not Much
  • 26,151
  • 46
  • 158
  • 314
  • 1
    Try `(?:_&_|[^&])+` – Wiktor Stribiżew Oct 01 '19 at 23:01
  • Use `(?<=[^\W_])&(?=[^\W_])` to find instances of `&` that aren't surrounded by `_` and replace them with `,` as shown [here](https://regex101.com/r/iCeubF/1). Then replace all instances of `_&_` with `&` as shown [here](https://regex101.com/r/iCeubF/2). – ctwheels Oct 01 '19 at 23:02

1 Answers1

2

You may use

val regex = "(?:_&_|[^&])+".r
println( regex.findAllIn("foo_&_bar&baz").map(_.toString).toList )
// => List(foo_&_bar, baz)

See the regex demo and a Scala demo.

The (?:_&_|[^&])+ regex matches 1 or more repetitons of _&_ or, if not found at the current location, char other than &.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • makes sense. but what is the need of the "?:" it's a non capturing group. why are we using that here? – Knows Not Much Oct 01 '19 at 23:08
  • 1
    @KnowsNotMuch You are not using the pattern for pattern matching, hence there is no need for a capturing group in the regex (`findAllIn` does not require a capturing group in the regex pattern). Since we do not want the overhead of re-writing each subsequent `_&_` or non-`&` in the capturing group memory buffer, it makes sense to make the group [non-capturing](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-in-regular-expressions). – Wiktor Stribiżew Oct 01 '19 at 23:13