-1

I have this regex pattern:

[^-]+-(.+)[^\d](.+)-(.*?)-.*(\d+).*-([\w]+-[\w]+-[^-]+)-(\d+-\d+)-(.+)\.

That needs to match both these cases

Data Location 1 - many many words 201808206566 - many words - 010114-INL-USD-B087834-2018-08-Bill.PDF

Data Location 1 - many many words 201808206565 - many words - 010115-INL-B087845-2018-08-Bill.PDF

As is, this matches the first case and not the second. And I get the opposite result by removing one instance of [\w]+- from within the 5th capture group, this is because the first case contains INL-USD-B087834, which has an additional data block in it. How can I make the second instance optional?

tracer tong
  • 535
  • 5
  • 15

1 Answers1

0

Put it in an optional group using the ? operator.

[^-]+-(.+)[^\d](.+)-(.*?)-.*(\d+).*-(\w+-(?:\w+-)?[^-]+)-(\d+-\d+)-(.+)\.

Or you use a numeric quantifier to allow 1 or 2 word blocks there:

[^-]+-(.+)[^\d](.+)-(.*?)-.*(\d+).*-((?:\w+-){1,2}[^-]+)-(\d+-\d+)-(.+)\.
Barmar
  • 596,455
  • 48
  • 393
  • 495
  • Your patterns cause catastrophical backtracking with the top example, see https://regex101.com/r/TQCOm2/1 – Wiktor Stribiżew Oct 04 '18 at 16:51
  • The OP's pattern has the same problem. I couldn't figure out how to fix it, so I just posted my answer with the change necessary to answer the question. – Barmar Oct 04 '18 at 16:55