-3

Im looking for a way to remove a row from a pandas data frame if it contains both of the 2 strings. I can do it if it has one, but havent been able to get both to work. Below is the code i used to remove records based on 1 string, looking to change that to include another keyword

code:

Vikings_dataframe_cleaned2=Vikings_dataframe_cleaned[Vikings_dataframe_cleaned.TweetText.str.contains("RT") == False]

Strings to be excluded: I want to check to text to make sure that it doesnt contain both @teddyb_h20 and @casekeenum7.

Example:

['@teddyb_h2o test test','@casekeenum7 and @teddyb_h2o are test','@casekeenum7 is the best right now']

The code should then produce a dataframe that looks like this:

['@teddyb_h2o test test','@casekeenum7 is the best right now']
J. McCraiton
  • 149
  • 1
  • 2
  • 9

1 Answers1

2

Sample df

df = pd.DataFrame({'col': ['@teddyb_h2o test test','@casekeenum7 and @teddyb_h2o are test','@casekeenum7 is the best right now','test test']})

    col
0   @teddyb_h2o test test
1   @casekeenum7 and @teddyb_h2o are test
2   @casekeenum7 is the best right now
3   test test

Solution:

df[~(df.col.str.contains('@teddyb_h2o') & df.col.str.contains('@casekeenum7'))]

    col
0   @teddyb_h2o test test
2   @casekeenum7 is the best right now
3   test test

@Wen's suggestion, more elegant

df[~df['col'].str.contains(r'^(?=.*@teddyb_h2o)(?=.*@casek‌​eenum7)')]
Vaishali
  • 32,439
  • 4
  • 39
  • 71