Regex for search and get the src of a image

Question

Assume I am having a html string containing the following code snippet.

... <img class="employee thumb" src="http://localhost/services/employee1.jpg" /> ...

I want to search whether this tag is available and if so get the src url. <img class="employee thumb" can be used to uniquely identify the tag.

How to do this in python?

Why use regular expressions when [excellent HTML parsers](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) are available? `soup = BeautifulSoup(yourpage)`, then `image = soup.select('img.employee.thumb')`. — Martijn Pieters, Mar 28 '14 at 10:34
Maybe using regexp for parsing html is not the best approach. This Answer talk about that: http://stackoverflow.com/a/1732454/661140 — Roberto, Mar 28 '14 at 10:37
Thanks for the info. I am getting the html using `page =urllib2.urlopen(url)` and `yourpage=page.read()`. Then I couldn't parse the html as you mentioned. Any thoughts? — Yasitha, Mar 28 '14 at 12:04
Although [you could do so](http://stackoverflow.com/a/4234491/471272), I do not recommend that route. — tchrist, Jun 06 '14 at 22:43

Tanveer Alam · Answer 1 · 2014-03-28T13:09:09.310

1

Using Regular Expression :

>>> import re
>>> str =  '<img class="employee thumb" src="http://localhost/services/employee1.jpg" />'
>>> if re.search('img class="employee thumb"', str):
...     print re.findall ( 'src="(.*?)"', s, re.DOTALL)
... 
['http://localhost/services/employee1.jpg']

Using lxml :

>>> from lxml import etree
>>> root = etree.fromstring("""
... <html>
...     <img class="employee thumb" src="http://localhost/services/employee1.jpg" />
... </html>
... """)
>>> print root.xpath("//img[@class='employee thumb']/@*")[1]
http://localhost/services/employee1.jpg

edited Mar 28 '14 at 13:09

answered Mar 28 '14 at 11:46

Tanveer Alam

4,661
3
19
41

The `lxml` version isn't much use; it doesn't actually search for the `img` tag in a larger document. – Martijn Pieters Mar 28 '14 at 11:49
1

No, you still only test if `root` is the image tag. The OP has a larger chunk of HTML, not just containing the `` tag. – Martijn Pieters Mar 28 '14 at 11:58
I think the input is in string format as mentioned in the question. So i think the only concern is about getting 'src' if class attrib is 'employee thumb'. – Tanveer Alam Mar 28 '14 at 12:02
No, the OP's first sentence is *Assume I am having a html string **containing the following code snippet***, emphasis mine. Note the `...` ellipsis in the HTML sample as well. – Martijn Pieters Mar 28 '14 at 12:05
Yeah that is what I am saying, it is specifically mention in the question itself. Assume I am having a html string containing the following code snippet. ... . – Tanveer Alam Mar 28 '14 at 12:08
tag is not the root of my html string. It just a part of it as mentioned. – Yasitha Mar 28 '14 at 12:26
@Yasita, I have edited it, now is not the root. – Tanveer Alam Mar 28 '14 at 13:12

Regex for search and get the src of a image

1 Answers1

Linked