Highest Voted 'crawlera' Questions

3

votes

1 answer

Why is scrapy with crawlera running so slow?

I am using scrapy 1.7.3 with crawlera (C100 plan from scrapinghub) and python 3.6. When running the spider with crawlera enabled I get about 20 - 40 items per minute. Without crawlera I get 750 - 1000 (but I get banned quickly of course). Have I…

asked Aug 03 '19 at 17:29

Wramana

151
2
15

3

votes

0 answers

Crawlera proxies not working with python selenium and Chrome Web driver

I have just bought a plan from crawlera for proxies and according to their documentation using polipo is not working for me and according to polipo site, it is outdated already. How Can I use crawlera proxies in Selenium and Chrome web driver. Here…

python selenium-chromedriver crawlera

asked Oct 21 '18 at 19:22

Mobasshir Bhuiyan

540
3
17

1

vote

1 answer

Scrapy crawlera authentication issue

I've been trying to use scrapy-crawlera as a proxy for scraping some data with scrapy. I've added these rows in settings.py: DOWNLOADER_MIDDLEWARES = { 'scrapy_crawlera.CrawleraMiddleware': 610, } CRAWLERA_ENABLED = True CRAWLERA_APIKEY =…

python web-scraping scrapy scrapinghub crawlera

asked Mar 09 '21 at 09:12

memeister

33
4

1

vote

0 answers

Crawlera & Puppeteer - Problem with Authenticantion in HTTPS

In basic example of crawlera & puppeteer, authorization of proxy is done in this way: await page.setExtraHTTPHeaders({ 'Proxy-Authorization': 'Basic ' + Buffer.from(':').toString('base64'), }); but it gives error:…

puppeteer chromium crawlera

asked Nov 27 '20 at 20:41

Tomasz Kurowski

11
1

1

vote

0 answers

How to use crawlera proxies in selenium

I have a selenium project. I am going to use Crawlera proxy in selenium. I have already an API Key of Crawlera. headless_proxy = "127.0.0.1:3128" proxy = Proxy({ 'proxyType': ProxyType.MANUAL, 'httpProxy':…

selenium scrapinghub crawlera

asked Jul 15 '20 at 20:59

pystack-piter

11
1

1

vote

2 answers

Downloading Images from list of URLs (Scrapy sends 2 requests per url)

So I ran a crawler last week and produced a CSV file that lists all the image URLs I need for my project. After reading the CSV to a python list, I was unsure how to use Scrapy to simply download them through a pipeline. I've tried many things and…

python scrapy imagedownload crawlera

asked Jun 28 '20 at 00:08

Chris4542

35
6

1

vote

0 answers

504 Timeout Exception when using scrapy-splash with crawlera

I tried scrapy-splash with http://www.google.com and followed all the prerequisite steps given in the following Github Repo https://github.com/scrapy-plugins/scrapy-splash and i was able to render the Google page. However when i tired the same…

python scrapy scrapy-splash scrapinghub crawlera

asked May 26 '20 at 09:36

Shashikiran

67
5

1

vote

1 answer

Scraping HTTPS pages using Scrapy and Crawlera

I would like to if it is possible to crawl https pages using scrapy + crawlera. So far I was using Python requests with the following settings: proxy_host = 'proxy.crawlera.com' proxy_port = '8010' proxy_auth = 'MY_KEY' proxies = { "https":…

python proxy scrapy crawlera

asked Jan 04 '19 at 22:58

Bociek

915
7
19

1

vote

1 answer

Is it possible to set different settings for different request in the same Scrapy spider?

I want to use Crawlera only for some requests in a Scrapy spider. So I want to set CRAWLERA_ENABLED differently for different requests. Is it possible?

python web-scraping scrapy scrapy-spider crawlera

asked Oct 20 '18 at 00:41

Aminah Nuraini

13,849
6
73
92

1

vote

2 answers

Connection was refused by other side: 111: Connection refused. when using Scrapy Crawlera in a linux server

Scrapy Crawlera was working just well in my Windows machine, but it gets error 111 when I run it in my linux server. Why is that? When I use curl, I got this error: curl: (7) Failed connect to proxy.crawlera.com:8010; Connection refused

python web-scraping scrapy screen-scraping crawlera

asked Oct 18 '18 at 02:53

Aminah Nuraini

13,849
6
73
92

0

votes

0 answers

Scrapy Cloud skipping through loop

This spider is supposed to loop through https://lihkg.com/thread/`2169007 - i*10`/page/1. But for some reason it skips pages in the loop. I looked through the item scraped in Scrapy Cloud, the items with the following urls were scraped: ... Item 10:…

python for-loop scrapy scrapy-splash crawlera

asked Apr 10 '21 at 09:34

shingseto

1
1

0

votes

1 answer

Set country while scraping Amazon

I'm scraping prices from Amazon,Everything works fine except I'm facing an issue of location,apparently some products are not available outside of US so when my program runs it fails to fetch prices, I'm using Crawlera for US IP proxy, but it still…

web-scraping scrapy crawlera

asked Nov 23 '20 at 12:31

Shubham Devgan

518
4
16

0

votes

1 answer

get https response from scrapy shell

I have a spider that is getting cookies from a site in the first few steps. I would like to get the cookies, start the scrape, and if the HTTP status of the current request == 302, I want to loop back to the cookies part to refresh them. How can I…

web-scraping scrapy scrapy-shell crawlera

asked Nov 13 '20 at 17:37

user14276694

43
5

0

votes

2 answers

How to resolve 502 response code in Scrapy request?

I created a spider that scrapes data from Yelp by using Scrapy. All requests go through Crawlera proxy. Spider gets the URL to scrape from, sends a request, and scrapes the data. This worked fine up until the other day, when I started getting 502…

scrapy request response crawlera

asked Nov 04 '20 at 15:03

amabeat_30

71
6

0

votes

1 answer

Use a specific Scrapy downloader middleware per request

I use Crawlera as a IP rotating service to crawl a specific website which is banning my IP quickly but I have this problem only with one website out of a dozen. As it is possible to register multiple middlewares for a Scrapy project, I wanted to…

python scrapy crawlera

asked May 28 '20 at 09:22

Max atton

91
7

Questions tagged [crawlera]