1

Please help me to extract the text after the last occurrence of the Schedule : and the end of the line.

Lane Closures : Lane one will be closed
Reason : Roadworks are planned
Status : Pending
Schedule : Expect disruption everyday between 20:00 and 06:00 from 5 October 2020 to 9 October 2020
Schedule : Expect disruption everyday between 20:00 and 06:00 from 12 October 2020 to 16 October 2020
Schedule : Expect disruption everyday between 20:00 and 06:00 from 19 October 2020 to 23 October 2020
Schedule : Expect disruption everyday between 20:00 and 06:00 from 26 October 2020 to 31 October 2020
Lanes Closed : There will be one of two lanes closed

In the above case, I need to extract Expect disruption everyday between 20:00 and 06:00 from 26 October 2020 to 31 October 2020

So far I came up only with the following:

(?<=Schedule : ).*(?![\s\S]*Schedule)

But it doesn't work in Postgres. It returns error: invalid regular expression: invalid escape \ sequence

I also tried to replace \s and \S with [:space:] and ^[:space:] as per Postgres documentation but it doesn't work either

Thanks in advance.

Dmytro Sly
  • 67
  • 6

1 Answers1

2

Since a . in a PostgreSQL regex matches any char including line break chars, you need to introduce two changes:

  • The first .* should be replaced with [^\r\n]+ to match any chars other than common line break chars
  • The [\s\S] in the lookahead should be replaced with a mere ..

You can use

(?<=Schedule : )[^\r\n]+(?!.*Schedule)

See the online demo:

SELECT REGEXP_MATCHES(
    E'Lane Closures : Lane one will be closed\nReason : Roadworks are planned\nStatus : Pending\nSchedule : Expect disruption everyday between 20:00 and 06:00 from 5 October 2020 to 9 October 2020\nSchedule : Expect disruption everyday between 20:00 and 06:00 from 12 October 2020 to 16 October 2020\nSchedule : Expect disruption everyday between 20:00 and 06:00 from 19 October 2020 to 23 October 2020\nSchedule : Expect disruption everyday between 20:00 and 06:00 from 26 October 2020 to 31 October 2020\nLanes Closed : There will be one of two lanes closed', 
    '(?<=Schedule : )[^\r\n]+(?!.*Schedule)', 
    'g') 

Output:

enter image description here

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • @DmytroSly Just FYI, you do not need the lookbehind, just use a capturing group around the pattern part you need to extract, `'Schedule\s*:\s*([^\r\n]+)(?!.*Schedule)'` would work the same here. – Wiktor Stribiżew Dec 03 '20 at 09:21