Beautiful Soup Returning Unwanted Characters

Question

I'm using Beautiful Soup to scrape pages trying to get the height of certain athletes:

req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
height = soup.find_all("strong")
height = height[2].contents
print height

Unfortunately, this is what gets returned:

[u'6\'0"']

I've also tried:

height = str(height[2].contents)

and

height = unicode(height[2].contents)

but I still get [u'6\'0"'] as a result.

How can I just have 6'0" returned without the extra characters? Thanks for your help!

score 0 · Accepted Answer · answered Jun 11 '16 at 02:40

0

Those aren't "extra characters". .contents returns a list, the element you chose only has one child, and so you're getting a list containing one element. Python prints a list as pseudo Python code, so you can see what it is and what's in it.

Perhaps you want .string?

answered Jun 11 '16 at 02:40

Eevee

43,129
10
82
119

You're a genius. Thank you very much! :) – CGul Jun 11 '16 at 02:50
@CGul could you mark as the accepted answer, so the question doesn't look unanswered forever? :) – Eevee Jun 11 '16 at 04:54

score 0 · Answer 2 · answered Jun 11 '16 at 10:46

If you just want the third strong tag you don't need to find everyone, you can use a css selector nth-of-type, once you have the element you just need to call .text:

req = requests.get(url)
soup = BeautifulSoup(req.content, "html.parser")
height = soup.select_one("strong:nth-of-type(3)").text

print(height)

You should also be calling .content,letting requests handle the encoding.

Beautiful Soup Returning Unwanted Characters

2 Answers2