2

Here is my regex:

(?<!PAYROLL)(FIDELITY(?!.*TITLE)(?!.*NATION)|INVEST)(?!.*PAYROLL)

Here is my text

INCOMING WIRE TRUST GS INVESTMENT 
VANGUARD PAYROLL
PAYROLL FIDELITY
ACH CREDIT FIDELITY INVESTM-FIDELITY
ACH CREDIT FIDELITY INVESTM-FIDELITY
ACH DEBIT FIDELITY 
ACH DEBIT FIDELITY 
ACH CREDIT FIDELITY INVESTM-FIDELITY

When running this on http://regexr.com (using the PCRE RegEx Engine), it is matching on "PAYROLL FIDELITY", yet I'm specifying a negative lookbehind to not do that(?<!PAYROLL).

Any help appreciated.

mikelowry
  • 593
  • 1
  • 5
  • 15

1 Answers1

1

The (?<!PAYROLL) negative lookbehind matches a location that is not immediately preceded with PAYROLL char sequence. In the PAYROLL FIDELITY string, the FIDELITY is not immediately preceded with PAYROLL, it is immediately preceded with PAYROLL + space.

You can solve the current problem in various ways. If you are sure there is always a single whitespace between words in the string (say, it is a tokenized string) add \s after PAYROLL: (?<!PAYROLL\s).

If there can be one or more whitespaces, the (?<!PAYROLL\s+) pattern won't work in PCRE as PCRE lookbehind patterns must be of fixed width. You might match (some) exceptions and skip them using (*SKIP)(*FAIL) PCRE verbs:

PAYROLL\s+FIDELITY(*SKIP)(*F)|(FIDELITY(?!.*TITLE)(?!.*NATION)|INVEST)(?!.*PAYROLL)

See the regex demo. You may even replace PAYROLL\s+FIDELITY(*SKIP)(*F) with PAYROLL.*?FIDELITY(*SKIP)(*F) or PAYROLL[\s\S]+?FIDELITY(*SKIP)(*F) to skip any text chunk from PAYROLL till the leftmost FIDELITY. PAYROLL\s+FIDELITY(*SKIP)(*F) matches PAYROLL, one or more whitespaces, FIDELITY and then fails the match triggering backtracking, and then the match is skipped and the next match is searched for starting from the index where the failure occurred.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397