I have a regular expression to fetch some links in HTML document.
((http://)(|up)(\.example\.com))*(/uploads/pp2p|/sites/default/files/[-_a-zA-Z0-9%/]+)\.(jpg|jpeg|gif|png)
What I am intending to match is, if the http
part exists match it if not, don't. If up
part exists match it if not, don't. If example.com
exists match it if not, don't. The same about /uploads/pp2p
and the other one, if exists match if not, don't. Finally, if it has one of the following image formats match it if not, don't. I expect to get a list of links like
links = ['http://up.example.com/uploads/pp2p/www.jpg', '/sites/default/files/.png', 'http://example.com/uploads/zzz.jpg']
And the elements in the link continue to be filled with different combinations. Anyway, I am getting results as a tuple like
[('', '', '', '', '/sites/default/files/favicon', 'png'), ('', '', '', '', '/sites/default/files/logo_2', 'png')]
I don't want to get a tuple, I want the match to be represented as a whole. Only a complete link in each list element. How can I avoid getting a tuple as a result of the Regex match?