2

In pandas I can search and replace all fields that contain the word fish, for example, using df.replace(r'.*fish.*', 'foo', regex = True).

But how do I search and replace all fields that don't contain the word fish?

That is in my example replace all fields that don't contain the word fish with the word 'foo'.

For example, say the dataframe is

applefish pear
water     afishfarm

I would like this to be transformed to

applefish foo
foo       afishfarm 
eleanora
  • 9,397
  • 17
  • 58
  • 128
  • 1
    What does your dataframe look like and what is your expected output? – cs95 Aug 31 '17 at 17:45
  • Duplicate of [Regular expression to match a line that doesn't contain a word](https://stackoverflow.com/questions/406230/regular-expression-to-match-a-line-that-doesnt-contain-a-word) – Wiktor Stribiżew Apr 15 '20 at 17:45

2 Answers2

8

You can use negative look ahead (?!) assertion; ^(?!.*fish).*$ will firstly assert the pattern doesn't contain the word fish and then match every thing till the end of string and replace it with foo:

  • ^ denotes the beginning of string, combined with (?!.*fish), it asserts at BOS that there is no pattern like .*fish in the string;
  • If the assertion succeeds, it matches everything till the end of string .*$, and replace it with foo; If the assertion fails, the pattern doesn't match, nothing would happen;

so:

df.replace(r'^(?!.*fish).*$', 'foo', regex=True)
#           0           1
#0  applefish         foo
#1        foo   afishfarm

If the string can contain multiple words:

df
#                0          1
#0  applefish pear       pear
#1           water  afishfarm

You can use word boundary \b to replace ^ and word characters \w to replace .:

df.replace(r'\b(?!.*fish)\w+', 'foo', regex=True)
#               0           1
#0  applefish foo         foo
#1            foo   afishfarm
Psidom
  • 171,477
  • 20
  • 249
  • 286
  • Thank you. Can you explain the `^` and `$` please. I assume they mark the start and end of the field but why are they needed? Also, why do we need the brackets? The close bracket is also before the final `.*` which is surprising. – eleanora Aug 31 '17 at 17:53
  • Sure. Updating with some explanations. – Psidom Aug 31 '17 at 17:55
  • Also, why don't you need the `r'` ? – eleanora Aug 31 '17 at 17:58
  • Wow, I thought it wasn't doable but you sure showed otherwise. Nice. – cs95 Aug 31 '17 at 18:02
  • 1
    The `$` is not necessary, but you would need `^` to assert at the beginning of string, otherwise, `farm` will get replaced because the cursor moves to the middle of string and the assertion succeeds. The parenthesis is part of the syntax for regex look around. – Psidom Aug 31 '17 at 18:03
  • However this assumes each word is in a separate column. I was not working under this assumption. – cs95 Aug 31 '17 at 18:04
  • @cᴏʟᴅsᴘᴇᴇᴅ Thanks for the comment. Updated with an option to handle multiple-word strings. – Psidom Aug 31 '17 at 18:15
  • 1
    I un-marked this as a duplicate because this inline `.*text` way is faster because it doesn't enter into an assertion on each character as opposed to `(?:(?!text).)*` ... `Regex1: ^(?!.*fish).*$ Options: < m > Completed iterations: 100 / 100 ( x 1000 ) Matches found per iteration: 2 Elapsed Time: 0.43 s, 426.74 ms, 426741 µs Regex2: ^(?:(?!fish).)*$ Options: < m > Completed iterations: 100 / 100 ( x 1000 ) Matches found per iteration: 2 Elapsed Time: 0.53 s, 525.90 ms, 525899 µs` –  Aug 31 '17 at 19:22
  • @sln Thanks for checking out. – Psidom Aug 31 '17 at 19:28
6

You can use apply with str.contains

df.apply(lambda x: x.replace(x[~x.str.contains('fish')], 'foo'))

You get

    0           1
0   applefish   foo
1   foo         afishfarm

Note: I wouldn't even recommend this as Psidom's solution is way more efficient.

Vaishali
  • 32,439
  • 4
  • 39
  • 71