122

I'm using RegexBuddy but I'm in trouble anyway with this thing :\

I'm processing line by line a file. I built a "line model" to match what I want.

Now i'd like to do an inverse match... i.e. I want to match lines where there is a string of 6 letters, but only if these six letters are not Andrea, how should I do that?


EDIT: I'll write the program that uses this regex, I don't know yet if in python or php, I'm doing this thing first to learn some regex :) There are different types of line, I wanted to use regex to select the type i'm interested in. Once I got these lines I've to apply an other filter just to do not match a known value, I need all the others, not that. The (?!not-wanted) is working pretty fine, thank you. :-)

I hope this clarifies the question :)

Legionar
  • 6,939
  • 2
  • 34
  • 66
Andrea Ambu
  • 34,172
  • 14
  • 51
  • 76
  • It actually sounds like you might do better to give us a bit more information about what you're doing, and see if someone can offer an alternative solution. Typically, attempting to parse an entire file by constructing a regular expression that matches each line is a rather complicated route :) – Dan Oct 02 '08 at 20:33

9 Answers9

78
(?!Andrea).{6}

Assuming your regexp engine supports negative lookaheads..

Edit: ..or maybe you'd prefer to use [A-Za-z]{6} in place of .{6}

Edit (again): Note that lookaheads and lookbehinds are generally not the right way to "inverse" a regular expression match. Regexps aren't really set up for doing negative matching, they leave that to whatever language you are using them with.

Dan
  • 55,554
  • 9
  • 57
  • 76
  • You need to add the ^ that @Vinko Vrsalovic uses so that it won't match on "ndrea\n" – bdukes Oct 02 '08 at 20:34
  • 2
    . doesn't match \n by default (some languages [eg Perl] allow you to switch on that behaviour, but by default . matches everything BUT \n). – Dan Oct 02 '08 at 20:36
  • 1
    (plus, the OP never mentioned the string had to occur at the start of the line) – Dan Oct 02 '08 at 20:37
  • 1
    Andrea: OP means "original poster", so, I was referring to you :) – Dan Oct 02 '08 at 20:58
  • Dan: ok i did not learn the SO slang yet :P Thank you :) The same thing is commented on the Vinko Vrsalovic answer – Andrea Ambu Oct 02 '08 at 21:08
  • I'm guessing the {6} is set to 6 because that is the length of the string "Andrea", but if this _is_ the case, it should be made clear in the answer. – Shabbyrobe May 24 '11 at 01:07
  • This only works for strings that are exactly 6 characters long, as requested. Dmytro shared the answer for any length strings [here](http://stackoverflow.com/a/1909960/819417). – Cees Timmerman Jun 20 '13 at 15:24
52

For Python/Java,

^(.(?!(some text)))*$

http://www.lisnichenko.com/articles/javapython-inverse-regex.html

Rahul
  • 462
  • 8
  • 16
Dmytro
  • 537
  • 4
  • 2
  • 4
    This doesn't work. You're thinking of the Tempered Greedy Token idiom. but the dot has to go *after* the lookahead, not before. See [this question](http://stackoverflow.com/questions/30900794/tempered-greedy-token-what-is-different-about-placing-the-dot-before-the-negat). But that approach is overkill for this task anyway. – Alan Moore Aug 09 '16 at 09:42
  • Don't know which language it is written in, but worked like a charm in Sublime text to clean up my test data. Thanks! – Matthias dirickx May 04 '17 at 11:21
  • 1
    @AlanMoore Actually, it'll _almost_ work for this use case. However, if `some text` starts the line, it will return the wrong result. – Zenexer Aug 12 '17 at 08:35
  • 2
    @Zenexer, that's what I meant. If the dot is after the lookahead instead of before, it works perfectly. – Alan Moore Aug 14 '17 at 18:09
  • Here is a [link](https://superuser.com/a/1334072/411849) that explains more. I do not understand why `?!` and not just `!`. – Timo May 07 '19 at 09:10
28

Updated with feedback from Alan Moore

In PCRE and similar variants, you can actually create a regex that matches any line not containing a value:

^(?:(?!Andrea).)*$

This is called a tempered greedy token. The downside is that it doesn't perform well.

Zenexer
  • 16,313
  • 6
  • 62
  • 72
  • 1
    This is the Tempered Greedy Token in long form. Just put the dot (or `[\s\S]`, which is only useful in JavaScript) after the second lookahead, and you don't need the first one: `^(?:(?!Andrea).)*$`. – Alan Moore Aug 09 '16 at 10:00
  • @AlanMoore Nice! I couldn't find any established pattern that worked like that, so I came up with my own. Rather than me taking your answer, you should provide that as your own. – Zenexer Aug 23 '16 at 06:11
  • That's okay, there are already plenty of good answers. And you deserve credit for inventing the idiom on your own. Cheers! – Alan Moore Aug 23 '16 at 13:57
  • Why do you suggest using `[\S\s]`? OP is talking about matching lines, not containing "Andrea" word. Not about checking if the whole string contains this word. Am I missing something? – x-yuri Jul 29 '17 at 06:37
  • @x-yuri I think you're right. I probably answered the question I had was I first visited this page, ignoring the discrepancy. My connection isn't good enough to update the answer right now, though (< 10 kbps) – Zenexer Jul 29 '17 at 07:04
  • Okay, undeleting this and making an attempt at cleaning this up. – Zenexer Aug 12 '17 at 08:30
11

What language are you using? The capabilities and syntax of the regex implementation matter for this.

You could use look-ahead. Using python as an example

import re

not_andrea = re.compile('(?!Andrea)\w{6}', re.IGNORECASE)

To break that down:

(?!Andrea) means 'match if the next 6 characters are not "Andrea"'; if so then

\w means a "word character" - alphanumeric characters. This is equivalent to the class [a-zA-Z0-9_]

\w{6} means exactly 6 word characters.

re.IGNORECASE means that you will exclude "Andrea", "andrea", "ANDREA" ...

Another way is to use your program logic - use all lines not matching Andrea and put them through a second regex to check for 6 characters. Or first check for at least 6 word characters, and then check that it does not match Andrea.

Hamish Downer
  • 15,325
  • 14
  • 82
  • 80
7

Negative lookahead assertion

(?!Andrea)

This is not exactly an inverted match, but it's the best you can directly do with regex. Not all platforms support them though.

Vinko Vrsalovic
  • 244,143
  • 49
  • 315
  • 361
6

If you want to do this in RegexBuddy, there are two ways to get a list of all lines not matching a regex.

On the toolbar on the Test panel, set the test scope to "Line by line". When you do that, an item List All Lines without Matches will appear under the List All button on the same toolbar. (If you don't see the List All button, click the Match button in the main toolbar.)

On the GREP panel, you can turn on the "line-based" and the "invert results" checkboxes to get a list of non-matching lines in the files you're grepping through.

Jan Goyvaerts
  • 19,905
  • 7
  • 57
  • 67
5

(?! is useful in practice. Although strictly speaking, looking ahead is not regular expression as defined mathematically.

You can write an invert regular expression manually.

Here is a program to calculate the result automatically. Its result is machine generated, which is usually much more complex than hand writing one. But the result works.

weakish
  • 23,766
  • 4
  • 44
  • 54
2

I just came up with this method which may be hardware intensive but it is working:

You can replace all characters which match the regex by an empty string.

This is a oneliner:

notMatched = re.sub(regex, "", string)

I used this because I was forced to use a very complex regex and couldn't figure out how to invert every part of it within a reasonable amount of time.

This will only return you the string result, not any match objects!

Matthias Herrmann
  • 2,251
  • 3
  • 21
  • 55
-4

In perl you can do

process($line) if ($line =~ !/Andrea/);

phreakre
  • 177
  • 1
  • 7