a web scraping development and services company, supplies cloud-based web crawling platforms.
Questions tagged [scrapinghub]
175 questions
1
vote
0 answers
Crawlera/Zyte proxy authentication using C# and Selenium
I've tried a number of ways of using Zyte (formally Crawerla) proxies with Selenium.
They provide
1- API key (username)
2- Proxy url/port.
No password is needed.
What I have tried...
ChromeOptions options = new ChromeOptions();
var proxy =…
![](../../users/profiles/9990572.webp)
MattHodson
- 508
- 4
- 16
1
vote
1 answer
Not able to scrape image URLs using beautiful soup and python
So basically I am using the below code to scrape the image urls of the credit cards from the respective links in the explore_more_url variable.
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json, requests, re
from selenium…
user15215612
1
vote
1 answer
How can I scrape the image using Beautiful Soup and python
I am trying to scrape the image link from the below link but I am not able to
Link : https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM
I have used the below code
x = '…
![](../../users/profiles/14806228.webp)
Ali Baba
- 65
- 8
1
vote
2 answers
Trying to scrape image urls but not able to get it using beautiful soup and python
I am scraping this link :…
![](../../users/profiles/14806228.webp)
Ali Baba
- 65
- 8
1
vote
0 answers
Is it possible to use a monitor on a script if it fails?
I use scrapinghub to run my spiders. I have a FinishReasonMonitor that slacks me if a spider fails. Is it possible to apply this to a script? My spiders rarely fail, but my scripts occasionally do. In scrapinghub it shows script outcomes as…
![](../../users/profiles/7351159.webp)
weston6142
- 115
- 8
1
vote
0 answers
I am using scrapy to scrape data from Yelp. I cannot see any error but data is not getting scraped from the StartURLs mentioned in the spider
Code for the items.py and other files are mentioned below. The logs are also mentioned at the end.I am not getting any error but according to the logs the scrapy has not scraped any pages.
```
import scrapy
class YelpItem(scrapy.Item):
#…
![](../../users/profiles/14336954.webp)
sneha s
- 11
- 1
1
vote
0 answers
How to use crawlera proxies in selenium
I have a selenium project. I am going to use Crawlera proxy in selenium. I have already an API Key of Crawlera.
headless_proxy = "127.0.0.1:3128"
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy':…
![](../../users/profiles/13938246.webp)
pystack-piter
- 11
- 1
1
vote
0 answers
504 Timeout Exception when using scrapy-splash with crawlera
I tried scrapy-splash with http://www.google.com and followed all the prerequisite steps given in the following Github Repo https://github.com/scrapy-plugins/scrapy-splash and i was able to render the Google page.
However when i tired the same…
![](../../users/profiles/9987191.webp)
Shashikiran
- 67
- 5
1
vote
1 answer
Scrapinghub Deploy Failed
I am trying to deploy a project to scrapinghub and here's the error I am getting
slackclient 1.3.2 has requirement websocket-client<0.55.0,>=0.35, but you have websocket-client 0.57.0.
Warning: Pip checks failed, please fix the conflicts.
WARNING:…
![](../../users/profiles/11020183.webp)
johncsmith427
- 43
- 6
1
vote
0 answers
scrapinghub requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://storage.scrapinghub.com
I am trying to run scrapy_price_monitor in local environment, but when I give the command "scrapy crawl spidername", it returns "unauthorized" when trying to send the item to storage.scrapinghub.
I have already succesfully "shub login" (added my…
![](../../users/profiles/1708819.webp)
pedrovgp
- 545
- 5
- 20
1
vote
0 answers
Scrapy: settings, multiple concurrent spiders, and middlewares
I'm used to running spiders one at a time, because we mostly work with scrapy crawl and on scrapinghub, but I know that one can run multiple spiders concurrently, and I have seen that middlewares often have a spider parameter in their…
![](../../users/profiles/3898977.webp)
kenshin
- 197
- 10
1
vote
0 answers
Why Splash headless browser can not able to fetch the page of Linkedin
I have tried to get the page source of Linkedin. But I cannot able to fetch even for one URL. I got the response like "Failed loading page"
Few samples,
https://www.linkedin.com/company/amazon
https://www.linkedin.com/company/apple
Splash version:…
![](../../users/profiles/7676202.webp)
Mideen abdul gaffoor
- 71
- 6
1
vote
1 answer
How to scrape multiple websites with different data in urls
I'm scraping some data from a webpage where at the end of the url has the id of the product, it appears to rewrite the data at every single row, like its not appending the data from the next line, I don't know exactly what's going on, if my first…
![](../../users/profiles/6824196.webp)
Ivan Barba
- 11
- 1
1
vote
1 answer
mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' on Scrapinghub
I try to run my spider on scrapinghub, and run it getting an error
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File…
![](../../users/profiles/11848592.webp)
Biddaris
- 33
- 1
- 4
1
vote
1 answer
"'str' object has no attribute 'get'" when using Google Cloud Storage with ScrapingHub
I'm trying to get Google Cloud Storage working with a Scrapy Cloud + Crawlera project so that I can save text files I'm trying to download. I'm encountering an error when I run my script that seems to have to do with my Google permissions not…
![](../../users/profiles/4115031.webp)
Nathan Wailes
- 6,053
- 5
- 35
- 68