1

([^\W\dA-Z && (I|X|L|V|\.)])\1{2} works in http://regex101.com/r/xB5sT0/1

How to make it work except in \b(Fuss|Mass|Bloss|Gross) cases?

All of the listed German words can be in the middle of the word as well, or start with small letters (fuss|mass|bloss|gross)

I do not want to match composita words like Fusssoldat, because I know everything is ok with them, because Fuss + soldat makes sense.

revo
  • 43,830
  • 14
  • 67
  • 109
gasyoun
  • 15
  • 5
  • Does [this](http://regex101.com/r/bD3gO9/2) do what you want? – Aran-Fey Oct 04 '14 at 17:49
  • Indeed it works (on the web), but as a Perl sequence can't be used in Notepad++ or EmEditor. I guess so will EditPlus. These are my main 3 editors, would not want to add the 4th one. – gasyoun Oct 04 '14 at 20:49

2 Answers2

0

You can use the discard technique that consists of putting the discard patterns at the beginning of a regex splitted by pipes (regex OR) and use a capturing group at the end. Like the following:

discard patt 1 | discard this too | another discard pattern | (keep this)

So, for you case you could do something like this:

\b(?:Fuss|Mass|Bloss|Gross)|([^\W\dA-Z && (I|X|L|V|\.)])\1{2}

And then access to the capturing group to grab your content.

If you use PCRE (Perl Compatible RegEx) regex you could use (*SKIP)(*FAIL) flags to discard a matched pattern. Like:

\b(?:Fuss|Mass|Bloss|Gross)(*SKIP)(*FAIL)|([^\W\dA-Z && (I|X|L|V|\.)])\1{2}

If you want to learn more about this trick you can take a look at this excellent thread:

Regex Pattern to Match, Excluding when... / Except between

Community
  • 1
  • 1
Federico Piazza
  • 27,409
  • 11
  • 74
  • 107
  • Sure I can use '|', but the only questions was grouping. I do not use 'PCRE'. Tested '\b(?:Fuss|Mass|Bloss|Gross)|([^\W\dA-Z && (I|X|L|V|\.)])\1{2}' on http://regex101.com/r/xB5sT0/1 - did not worked, it should have ignored Fusssoldat and found only Dusssoldat, but failed. Thanks for the detailed answer. – gasyoun Oct 04 '14 at 18:14
0
([^\W\dA-Z && (I|X|L|V|\.)])\1(?<!(?i)fuss|mass|bloss|gross)\1

Regular expression visualization

Debuggex Demo

I found a solution this way:

  1. your main pattern ([^\W\dA-Z && (I|X|L|V|\.)])
  2. repeated once (!) \1 (not twice like before)
  3. look behind if the current matching isn't one of the following: (?<!(?i)fuss|mass|bloss|gross)
  4. repeat the the main pattern match once more to ensure three occurances
bukart
  • 4,776
  • 2
  • 18
  • 40
  • Wow, not only it worked it gives a visualization. That is something great to see, never seen before. I give you my thanks. – gasyoun Oct 04 '14 at 18:37
  • I only wonder why https://www.debuggex.com/r/CUCZMUDvnxPEzyKK/1 says "Result: Does not match starting at the black triangle slider" because 1) it does the job 2) does it perfect http://regex101.com/r/xB5sT0/3 – gasyoun Oct 04 '14 at 18:45
  • The pattern is not supposed to match the `sss` in `Schlussscene`? – Aran-Fey Oct 04 '14 at 18:54
  • @bukart: Ah, I see. I noticed it doesn't match at regex101.com/r/xB5sT0/3, but that's because the global modifier isn't enabled. – Aran-Fey Oct 04 '14 at 18:58
  • @Rawing why yes, matching is wanted. – gasyoun Oct 04 '14 at 20:50