8

Apologies for my blatant lack of knowledge on regular expressions, I understand lots of questions crop up here on them, but try as I might for hours on end I cannot figure this out. Basically what I am trying to do is replace all occurrences of 6 digit numbers (with or without hyphens) in a string. However I do not wish to replace numbers if preceded by certain words.

This Regular expression to match a line that doesn't contain a word? solution comes close to what I am looking for but I cannot seem to use it in a way that works for my requirements.

What I need is as follows: For the string:

"User paid £43 on 23/05/14 to account 123456 with cheque 123456 transaction: 123456."

I wish to only replace the 6 digit number not preceded by "cheque", or "transaction:". What I have been trying is as follows:

\b[0-9]{2}-?[0-9]{2}-?[0-9]{2}\b 

(This replaces all 6 digit numbers)

Using this How do you replace a match, using regex, only if it is not preceded by a given character? answer, I tried

(^cheque\s[0-9]{2}-?[0-9]{2}-?[0-9]{2}\b) 

(Please note I am trying first for one of the words I wish to escape and will then include the others.) This does not replace any of the 6 digit numbers.

Through trial and error I have found

(cheque\s+[0-9]{2}-?[0-9]{2}-?[0-9]{2}\b) 

will replace the word cheque followed by a 6 digit number so I am getting there - but I need to negate this (and transaction followed by 6 digit number) and replace instead the 6 digit number not preceded by these words.

This How to negate the whole regex? answer is helpful on figuring out how to negate the expression but try as I might, I cannot find how to make it work for my situation. I tried

^(?!(?:((transaction\s+[0-9]{2}-?[0-9]{2}-?[0-9]{2}\b) )|((cheque\s+[0-9]{2}-?[0-9]{2}-?[0-9]{2}\b) ))$).*$

but this replaced the whole string!

Any help on this is greatly appreciated.

Thanking you.

Community
  • 1
  • 1
Misemefein
  • 173
  • 3
  • 17
  • 2
    Please clarify. What language are you using? Also, if you are running a standard "replace all", you can't not replace some matches. You can change the regex to NOT match them at all (fulfilling the requirement) or iterate through matches and examine them with further code. Is not matching them sufficient for your needs? – Necreaux Mar 09 '15 at 17:02
  • You can only do it if your regex engine supports [negative look-behinds](http://www.regular-expressions.info/lookaround.html). – m0skit0 Mar 09 '15 at 17:14
  • Thanks for the very quick replies! I'm using c# and I'm stuck with visual studio 2005. – Misemefein Mar 09 '15 at 17:24
  • I need to replace those that do not have 'cheque' – Misemefein Mar 09 '15 at 17:24

2 Answers2

15

Try this:

(?<!(?:cheque|transaction:)\s*)\d{2}-?\d{2}-?\d{2}\b

Explanation:

  • (?<! ... ) Negative lookbehind assertion (match anything not preceded by this)
  • (?:cheque|transaction:)\s* non-capturing group "cheque" or "transaction:" followed by any number of spaces
  • \d{2}-?\d{2}-?\d{2}\b Six-digit number possibly hyphenated, ending in word boundary
tzaman
  • 42,181
  • 9
  • 84
  • 108
  • Thank you so much tzaman. I figured this out last night and your solution is very close to what I came up with: – Misemefein Mar 10 '15 at 09:10
0

Thank you very much @tzaman for your answer - with some head scratching, trial and error, and input from a colleague I figured it out last night and came up with the following. (I will add lots of cases because it is possible the user will enter things like cheque number / cheque no. etc. but they are just extra conditions - below is the main part of the problem solved.)

(?<!                                  
  (
    (cheque\s{0,}:{0,1}\s{0,})
    |
    (transaction\s{0,}:{0,1}\s{0,})
  )
)
(
  [0-9]{6}
  |
  ([0-9]{2}[0-9]{2}-[0-9]{2})
)

Thanks again.

Misemefein
  • 173
  • 3
  • 17