0

I have a sample file that contains some email address and other invalid information as below:

byha@gmail.com
sdf@yahoo.net
df-sdf@yahoo.com
sfew23@my-work.net
dfdw@schoo.edu
sdfs
Dfddfd@dsd
Dfddfd@dsld..com
ddfds
@sdfd.com

And I have a piece of code as below in Python

import re
pattern =re.compile(r"[a-zA-Z-.0-9]+@[a-zA-Z-]+\.(com|edu|gov|net)")
with open ("emaillist","r") as fp:
        for i in fp:
               email=re.findall(pattern,i)
               for j in email:
                   print (j)

Why it is resulting output as below:

com
net
com
net
edu

The search pattern seems to be correct...what is the mistake here ?

Sam
  • 1
  • Use a non capturing group, `[a-zA-Z-.0-9]+@[a-zA-Z-]+\.(?:com|edu|gov|net)`. – Paolo Aug 27 '18 at 19:50
  • i changed like this: pattern =re.compile(r"[a-zA-Z-.0-9]+@[a-zA-Z-]+\.(?:com|edu|gov|net)") and it worked.... but that is the noncapturing group ?: is really doing.. can you please explain how it works ? i was looking at other articles on non-capturing group, but i am still confused how it is working – Sam Aug 28 '18 at 09:48
  • Check the documentation for `re.findall` [here](https://docs.python.org/2/library/re.html). It states: *If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.* Therefore that is why the function returns unwanted results. If you use a non capturing group instead, the function will return the whole match. A non capturing group has similar syntax to a capturing group however the contents are not "stored" by the regex engine. Let me know if that helps. – Paolo Aug 28 '18 at 09:55

0 Answers0