Questions tagged [scrapinghub]

a web scraping development and services company, supplies cloud-based web crawling platforms.

175 questions
1
vote
0 answers

Crawlera/Zyte proxy authentication using C# and Selenium

I've tried a number of ways of using Zyte (formally Crawerla) proxies with Selenium. They provide 1- API key (username) 2- Proxy url/port. No password is needed. What I have tried... ChromeOptions options = new ChromeOptions(); var proxy =…
MattHodson
  • 508
  • 4
  • 16
1
vote
1 answer

Not able to scrape image URLs using beautiful soup and python

So basically I am using the below code to scrape the image urls of the credit cards from the respective links in the explore_more_url variable. from urllib.request import urlopen from bs4 import BeautifulSoup import json, requests, re from selenium…
user15215612
1
vote
1 answer

How can I scrape the image using Beautiful Soup and python

I am trying to scrape the image link from the below link but I am not able to Link : https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM I have used the below code x = '…
1
vote
0 answers

Is it possible to use a monitor on a script if it fails?

I use scrapinghub to run my spiders. I have a FinishReasonMonitor that slacks me if a spider fails. Is it possible to apply this to a script? My spiders rarely fail, but my scripts occasionally do. In scrapinghub it shows script outcomes as…
weston6142
  • 115
  • 8
1
vote
0 answers

I am using scrapy to scrape data from Yelp. I cannot see any error but data is not getting scraped from the StartURLs mentioned in the spider

Code for the items.py and other files are mentioned below. The logs are also mentioned at the end.I am not getting any error but according to the logs the scrapy has not scraped any pages. ``` import scrapy class YelpItem(scrapy.Item): #…
sneha s
  • 11
  • 1
1
vote
0 answers

How to use crawlera proxies in selenium

I have a selenium project. I am going to use Crawlera proxy in selenium. I have already an API Key of Crawlera. headless_proxy = "127.0.0.1:3128" proxy = Proxy({ 'proxyType': ProxyType.MANUAL, 'httpProxy':…
1
vote
0 answers

504 Timeout Exception when using scrapy-splash with crawlera

I tried scrapy-splash with http://www.google.com and followed all the prerequisite steps given in the following Github Repo https://github.com/scrapy-plugins/scrapy-splash and i was able to render the Google page. However when i tired the same…
1
vote
1 answer

Scrapinghub Deploy Failed

I am trying to deploy a project to scrapinghub and here's the error I am getting slackclient 1.3.2 has requirement websocket-client<0.55.0,>=0.35, but you have websocket-client 0.57.0. Warning: Pip checks failed, please fix the conflicts. WARNING:…
1
vote
0 answers

scrapinghub requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://storage.scrapinghub.com

I am trying to run scrapy_price_monitor in local environment, but when I give the command "scrapy crawl spidername", it returns "unauthorized" when trying to send the item to storage.scrapinghub. I have already succesfully "shub login" (added my…
pedrovgp
  • 545
  • 5
  • 20
1
vote
0 answers

Scrapy: settings, multiple concurrent spiders, and middlewares

I'm used to running spiders one at a time, because we mostly work with scrapy crawl and on scrapinghub, but I know that one can run multiple spiders concurrently, and I have seen that middlewares often have a spider parameter in their…
kenshin
  • 197
  • 10
1
vote
0 answers

Why Splash headless browser can not able to fetch the page of Linkedin

I have tried to get the page source of Linkedin. But I cannot able to fetch even for one URL. I got the response like "Failed loading page" Few samples, https://www.linkedin.com/company/amazon https://www.linkedin.com/company/apple Splash version:…
1
vote
1 answer

How to scrape multiple websites with different data in urls

I'm scraping some data from a webpage where at the end of the url has the id of the product, it appears to rewrite the data at every single row, like its not appending the data from the next line, I don't know exactly what's going on, if my first…
1
vote
1 answer

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' on Scrapinghub

I try to run my spider on scrapinghub, and run it getting an error Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File…
Biddaris
  • 33
  • 1
  • 4
1
vote
1 answer

"'str' object has no attribute 'get'" when using Google Cloud Storage with ScrapingHub

I'm trying to get Google Cloud Storage working with a Scrapy Cloud + Crawlera project so that I can save text files I'm trying to download. I'm encountering an error when I run my script that seems to have to do with my Google permissions not…
1 2
3
11 12