0

I am currently working on scraping similarweb datas using Scrapy Python in a virtual environment.

I have this example site : "https://www.similarweb.com/fr/website/golfgenius.com", and I want to scrape the number of visitors.

Unfortunately, divs and span are relatives so they won't have specific name you can scrape easily.

It means that if you wanna do a Xpath request you are going to have multiple answers everytime, unless you get a little more accurate.

So the span/text() I want is inside some span with relatives class names. (see screenshot)

The yellow highlighted expression is what i want, here is my Xpath expression yet : //span[@class='engagementInfo-value engagementInfo-value--large u-text-ellipsis']/span[@class='engagementInfo-valueNumber js-countValue']/text()

On browser it highlight only one span(which is good) and the right one, so everything cool. but if I try this expression inside scrapy shell for example, it returns me an empty list.

What am I doing wrong ? Is it not the way to get a text inside a span?

thank you if you wanna help !

barny
  • 5,280
  • 4
  • 16
  • 21
Luc Semon
  • 11
  • 2
  • 1
    That data is probably updated using javascript after the page loads. That means the data is *not* contained in the page source itself, which is what you get using Python requests. You will either have to (a) use something like selenium to script an actual browser, or (b) see if you can figure out the URLs from which the page fetches the actual data. – larsks Jan 15 '20 at 12:10
  • try: from scrapy.utils.response import open_in_browser and then open_in_browser(response) . often your crawler doesn't see what you see in normal browser (hint, it could be seeing a captcha page instead) – Janib Soomro Jan 16 '20 at 09:57
  • FYI it's __scrape__ (and __scraping__, __scraped__, __scraper__) not scrap – barny May 05 '21 at 15:08

0 Answers0