0

I have a dataframe with log error messages. The column we need looks something like this:

message
 
"System error foo"  
"System error foo2"    
"System error foo"    
"System error foo"   
"System error foo3"

I need to count all error messages, doesn't matter what kind of error they are.

Usually, if I knew a specific message, I'd filter a dataframe like this:

df2 = df[df['message'] == 'System error foo3.']

But how can I do this with all the messages that just contain "System error" plus whatever else goes after it? I tried it with the asterix, but it didn't work of course. Is there some sort of python or pandas native wildcard operator? Or do I need to use regex?

catLuck
  • 187
  • 2
  • 14

1 Answers1

2

You can use contains

import pandas as pd

>>> df = pd.DataFrame(data=["System Error foo 1","System Error bar 2","System Error foo3","Error bar"],columns=["messages"])
>>> df
             messages
0  System Error foo 1
1  System Error bar 2
2   System Error foo3
3           Error bar
>>> df[df['messages'].str.contains('System Error')]
             messages
0  System Error foo 1
1  System Error bar 2
2   System Error foo3


Subbu VidyaSekar
  • 1,913
  • 1
  • 14
  • 27
Vaebhav
  • 1,471
  • 5
  • 21