How to check a column for valid values in csv using regex?

Question

I have a CSV file with columns [Name,Email, Address, Credit Card]. I want to apply regex on each column and check whether that column is valid or not. For example, the email column should contain all the email values.

for i in df['Email']:
    
    lst = re.findall('\S+@\S+', i)   
    if lst!=None:
        count=count+1 
        
        print("Match Numer : ",count,"Match Found :   ",lst)
    else:
        print(i," is not a valid email")

The output for a valid email like 'xyz@gmail.com' should be like this : Match Number: 100['xyz@gmail.com'] The problem with this is that the output for an invalid email like 'notvalidemail' is: Match Number: 101[] The else part is never executed! Can someone please help me with this?

try `if lst :` instead of `if lst != None:` – Paul Bombarde Jun 26 '20 at 15:12 — Paul Bombarde, Jun 26 '20 at 15:12

score 0 · Answer 1 · answered Jun 26 '20 at 15:59

0

I wouldn't use a for loop to do so, pandas already has really cool methods to do regex comparison. Using the same regular experesion, you can do something like this:

valid_emails = df.loc[df.Email.str.match('\S+@\S+'), 'Email']
print(valid_emails)

You can also filter wrong emails by negating the output of the match method

not_valid_emails = df.loc[~(df.Email.str.match('\S+@\S+')), 'Email']
print(not_valid_emails)

Or even quickly counts the number of valid emails:

df.Email.str.match('\S+@\S+').value_counts()

Or if you prefer a percentage approach:

df.Email.str.match('\S+@\S+').value_counts(normalize=True)

answered Jun 26 '20 at 15:59

jcaliz

3,336
2
6
11

Just note `.str.match('\S+@\S+')` will find a match in `@@some text here` string that does not look like an email. – Wiktor Stribiżew Jun 26 '20 at 16:05
You are correct, that is why it is for address the pandas issue, not the regular expression to be used. – jcaliz Jun 27 '20 at 03:24

How to check a column for valid values in csv using regex?

1 Answers1