1

The html of the website looks like:

<div class="breed-image">
    <img src = "link to image">
</div>

When I do this:

soup = BeautifulSoup(response.text, 'lxml')
for link in soup.find_all(class_='breed-image'):

    print(link)

All it does is print out:

<div class="breed-image">
</div>

I have also tried print(link.text)

All that does is print out:

None

Any kind of help is appreciated, thanks!

Mark W
  • 75
  • 1
  • 4
  • 9

2 Answers2

0

Couple of options:

>>> soup.img['src']
'link to image'
>>> for link in soup.find_all('img'):
...     print(link['src'])
...
link to image
Jonathon McMurray
  • 2,379
  • 1
  • 7
  • 22
  • For the first option it gives me the error `TypeError: 'NoneType' object is not subscriptable` and for the second it just does not print out anything – Mark W Jan 15 '18 at 22:18
  • @Jonation McMurray here is the link if you want to see the all of the html https://dog.ceo/dog-api/breeds-image-random.php – Mark W Jan 15 '18 at 22:19
  • It looks like this page has no img in the HTML, it gets added by some embedded Javascript - so if you're downloading this page e.g. with `responses` module, the image will not be added as the JS isn't executed. This question may help with that: https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – Jonathon McMurray Jan 15 '18 at 22:28
  • 2
    @MarkW yes, that's AFTER the JavaScript has executed - the JS inserts the `img` tag, without executing the JS source, there is no img tag. You can see this if you use 'View Source' in your browser instead of 'Inspect' – Jonathon McMurray Jan 15 '18 at 22:36
0

Looks like you might be better off hitting the API that this page calls to get its image:

In [13]: r = requests.get('https://dog.ceo/api/breeds/image/random')

In [14]: r.json()
Out[14]:
{'message': 'https://dog.ceo/api/img/terrier-dandie/n02096437_1790.jpg',
 'status': 'success'}
Nathan Vērzemnieks
  • 5,170
  • 1
  • 8
  • 21