1

I am using the text to speech software @Voice Aloud Reader which offers the ability to find and replace text using RegEx. What I would like to do is replace a character only if a different character appears at the beginning of the same line.

For example, say I have the following line

The quick brown fox jumps over the lazy dog.

Is it possible to find and replace the "e"s in this line with "3" depending on whether it starts with "T" using solely a RegEx expression?

Ideally, the result of such an expression would be this

Th3 quick brown fox jumps ov3r th3 lazy dog.

when applied to the same line, and this

She opened the door.

when applied to something else that doesn't begin with "T". The only thing I've found that somewhat resembles this is the lookbehind assertion but that only checks the character directly behind the one in question when making its judgement.

Ryszard Czech
  • 10,599
  • 2
  • 12
  • 31

1 Answers1

4

@Voice Aloud Reader regex supports Perl compatible regular expression syntax through DEELX engine, which make me think the following might work:

(?:^T|\G(?!^)).*?\Ke

See the online demo

  • (?: - Open non-capturing group.
    • ^T - Match start string ancor and "T" literally.
    • | - Or:
    • \G(?!^) - Assert position at end of previous match but prevent it from being at the start of the string with negative lookahead.
    • ) - Close non-capturing group.
  • .*? - Capture any character between zero and unlimited times but as few as possible. Lazy match!
  • \K - Reset starting point of the reported match.
  • e - Match "e" literally.

Note: As per @TheFourthBird, you can further reduce backtracking if you'd replace .*? with [^e\r\n]*.


Another option could possibly be:

^[^T].*(*SKIP)(*F)|e

See the online demo

  • ^ - Start string ancor.
  • [^T] - Match anything other than "T".
  • .* - Match any character other than newline zero or unlimited times.
  • ^(*SKIP)(*F) - Don't allow backtrack and force the pattern to fail. All characters to the left of (*SKIP) are now ignored.
  • | - Or:
  • e - Match a literal "e".

EDIT:

Maybe more interesting in DEELX regex engine even is the RIGHTTOLEFT mode to match "Backward assertion". This would be different to Perl-like syntax which only allows fixed-width lookbehind. From the docs:

"DEELX uses RIGHTTOLEFT mode to match "Backward assertion". The backward assertion has the same logic as lookahead assertion, except the direction. So, in DEELX, the backward assertion works for non-fixed width lookbehind."

That makes me believe the following should work:

(?<=^T.*)e
  • (?<=^T.*) - Non-fixed width lookbehind matching start string ancor, a literal "T" and zero or more characters other than newline.
  • e - Match a literal "e".

Unfortunately I have no means of testing this, but the theory suggests it should work.


Replace all options with "3".

JvdV
  • 41,931
  • 5
  • 24
  • 46
  • 1
    Small suggestion: for the first pattern you can reduce the backtracking using `[^e\r\n]*\Ke` instead of `.*?\Ke` https://regex101.com/r/7tk5Ky/1 – The fourth bird Jan 03 '21 at 11:37
  • 1
    Thanks for another valuable suggestion @Thefourthbird. I edited the answer to reflect on what you mentioned. – JvdV Jan 03 '21 at 11:42
  • Thank you very much for this answer. I'm new to RegEx and I appreciated the explanation of what each part of the expression was doing. I tried each of these but unfortunately, the language this app is working in is a bit limited. Option 3 returned the error that lookbehinds must have a defined maximum length less than or equal to 10 characters. It did not recognise the (*SKIP) in option 2, and in option 1 it did not recognise \K. Option 1 was the closest to success though, so is it possible to achieve the functionality of \K another way? – Daryl Streete Jan 03 '21 at 17:47
  • @DarylStreete. All answers should work with the appropriate implementation, flags etc as per the documentation. Unfortunately I can't test them for you. I suggest a good look at the docs to see what is the correct syntax. – JvdV Jan 03 '21 at 18:29