0
import pyperclip
import re
searchItems=pyperclip.paste()
phoneRegex= re.compile(r"(\(\d\d\d\))?\s?(\d\d\d)(-?)(\d\d\d\d)")
print(phoneRegex.findall(searchItems))
emailRegex= re.compile(r"((\w|\.|-)+)(@)(\w+)(\.)(com|org|gov|net)")
print(emailRegex.findall(searchItems))

This is what is copied to my keyboard:

Christen E. Alvarez     (972) 786-4438  christenalvarez1@gmail.com
    
Laura Anderson      (940)665-0605   womens-health@sbcglobal.net
    
Alana Andrews       (512) 891-0420  cocosalonaustin@yahoo.com
    
Cynthia Lou Andrews     (572)343-7546   cindy@beautifulsolutionstexas.com

So when I run, the output I get is:

[('(972)', '786', '-', '4438'), ('(940)', '665', '-', '0605'), ('(512)', '891', '-', '0420'), ('(572)', '343', '-', '7546')]
[('christenalvarez1', '1', '@', 'gmail', '.', 'com'), ('womens-health', 'h', '@', 'sbcglobal', '.', 'net'), ('cocosalonaustin', 'n', '@', 'yahoo', '.', 'com'), ('cindy', 'y', '@', 'beautifulsolutionstexas', '.', 'com')]

The question I have is, why is it in the emailRegex findall list, is the second element in each tuple the last character in the email username?

Also, how do I make it that in the phoneRegex findall list, that the third element in the tuples returned isnt a hyphen?

Chris Charley
  • 5,841
  • 2
  • 20
  • 23
Shyam Vyas
  • 61
  • 4
  • 1
    Just FYI: you need to learn *non-capturing* groups. Here, `(\w|\.|-)+`, you need a character class, `[\w.-]+`, or `(?:\w|\.|-)+` – Wiktor Stribiżew Oct 05 '20 at 21:59
  • Basically each list element is a capture group. If you don't want the captured output in the list remove the parenthesis. Remove the parenthesis around `-?` in phone regex and `\w|\.|-` in email regex – noah Oct 05 '20 at 22:00
  • @noah, your comment on the phone regex worked, but when I did what you said for email regex, it failed and only outputed the first letter of the email username, the dot, and then the service provider, another dot, and the com/org/net – Shyam Vyas Oct 05 '20 at 22:07
  • sorry, was looking too quickly. As Wiktor said either `[\w.-]`+, or `(?:\w|\.|-)+` – noah Oct 05 '20 at 22:10

0 Answers0