0

I'm stumped on how to even go about this.

I am trying to match the string "ashi" but not if the word containing it is in a small list of known false positives like "flashing", "lashing", "smashing". The false positive words can appear in the string as well as long as the string "ashi" (not as part of one of the false positive words) is in the string it should return true.

I'm using C# and I was trying to go about it without using regular expressions, but I am having no luck.

These strings should return true

...somethingashisomething...

...something2!ashi*&something... 

... something ashi something flashing...

These strings should return false

...somethingflashingsomething...

...smashingthesomething...

...the lashings are too tight...   
Paul
  • 1,121
  • 2
  • 15
  • 36
  • You're right, C Perkins. I guess there are other words too that contain ashi that would be false positives. Like smashing and lashing (as indicated by your other comment below). Ideally, I would like to create a list of a few words that contain ashi that should not be considered. And, also, if ashi appears ANYWHERE in the string not as part of one of those words, just return true. I'm going to edit this question based on your feedback. – Paul Jul 04 '19 at 17:49
  • 1
    Glad my comments helped, even though ironically I meant that those words should be allowed (based on the original specs). FYI, a search of an online English dictionary word list resulted in 208 non-proper words (i.e. no names) that have `ashi` in them. Many of them are obscure, but if my few examples prompted an updated response, I wonder if you mean to exclude most real words with ashi. – C Perkins Jul 04 '19 at 18:35
  • My best guess is that just a few words would be fine. I like the idea of if it ends with ng so lashing, flashing, and smashing would all be excluded. – Paul Jul 04 '19 at 19:10

4 Answers4

2

The following will match ashi but not within flashing. I interpreted "word" loosely, so flashing is not required to be isolated as a separate word with space/punctuation delimiters.

(?<=(?<prefix>fl)|)ashi(?(prefix)(?!ng))

It is sufficient to return true/false over the entire pattern and won't require checking specific capture groups. In other words, it is usable with Regex.IsMatch().

Pattern details:

(?<=               # Zero-width positive lookbehind: match but don't consume characters
  (?<prefix>fl)    # Named capture group to match "fl" at start of "flashing"
  |                # Alternate blank capture - will succeed if "fl" is not present
)                  # End lookbehind
ashi               # match literal "ashi"
(?(prefix)         # Conditional:  Only match if named group prefix has successful capture (i.e. "fl" was matched)
  (?!ng)           # Zero-width negative loohahead: Fail match if "ng" follows 
)                  # Close conditional (there is no false part, so match succeeds if "fl" was not present)

If flashing is only excluded as an isolated word, just add word boundary operators. This will match something like flashingwithnospace, whereas the first pattern would fail on that string:

(?<=(?<prefix>\bfl)|)ashi(?(prefix)(?!ng\b))

(FYI, the pattern will work in isolation, but if it is combined within another pattern, especially inside a repeating construction, it may not work due to the conditional on the named capture group. Once the named capture group has succeeded, the conditional will remain true while matching the larger pattern, even if it were to encounter another occurrence of ashi.)

C Perkins
  • 3,502
  • 2
  • 22
  • 33
2

Another option might be to use a negative lookbehind with a nested lookahead to match words that start with fl but not if they are followed by ashing to match ashi but not flashing.

(?<!\bfl(?=ashing\b))ashi

Explanation

  • (?<! Negative lookbehind, assert what is directly on the right is not
    • \bfl Word boundary, match fb
    • (?= Positive lookahead, assert what is directly on the right is
      • ashing\b Match ashing and word boundary
    • ) Close positive lookahead
  • ) Close positive lookbehind.
  • ashi Match literally

.NET Regex demo

Update

If you want to match and not match the updated values, you could use an alternation (?:sm|f?l) in the negative lookbehind to match sm or an optional f followed by l

(?<!(?:sm|f?l)(?=ashing))ashi

.NET regex demo | C# demo

The fourth bird
  • 96,715
  • 14
  • 35
  • 52
1

You can make use of a capturing group:

(flashing)|ashi

If the first group is not empty, you matched flashing literally

Jan
  • 38,539
  • 8
  • 41
  • 69
0

The question gives the examples

...somethingashisomething...
...something2!ashi*&something...
... something ashi something...

The second and third examples can be found by including the word boundary \b in the search, i.e. search for \bashi\b. Finding the first example requires more knowledge of what the two enclosing somethings are. If they are alphanumeric then you need to specify the problem in much more detail.

AdrianHHH
  • 12,664
  • 16
  • 48
  • 79
  • Sometimes it is useful and/or required to have more info. But the whole utility of regex to start with is matching arbitrary patterns without having to specify all possible combinations. This really doesn't answer the general question, rather just points out how to match one possible example. The question already provides sufficient conditions to match and shouldn't have to require a set of distinct enclosing characters. They may not be know, but that doesn't ruin the set of requirements already stated. – C Perkins Jul 04 '19 at 17:05