Questions tagged [crawlera]

25 questions
3
votes
1 answer

Why is scrapy with crawlera running so slow?

I am using scrapy 1.7.3 with crawlera (C100 plan from scrapinghub) and python 3.6. When running the spider with crawlera enabled I get about 20 - 40 items per minute. Without crawlera I get 750 - 1000 (but I get banned quickly of course). Have I…
Wramana
  • 151
  • 2
  • 15
3
votes
0 answers

Crawlera proxies not working with python selenium and Chrome Web driver

I have just bought a plan from crawlera for proxies and according to their documentation using polipo is not working for me and according to polipo site, it is outdated already. How Can I use crawlera proxies in Selenium and Chrome web driver. Here…
1
vote
1 answer

Scrapy crawlera authentication issue

I've been trying to use scrapy-crawlera as a proxy for scraping some data with scrapy. I've added these rows in settings.py: DOWNLOADER_MIDDLEWARES = { 'scrapy_crawlera.CrawleraMiddleware': 610, } CRAWLERA_ENABLED = True CRAWLERA_APIKEY =…
memeister
  • 33
  • 4
1
vote
0 answers

Crawlera & Puppeteer - Problem with Authenticantion in HTTPS

In basic example of crawlera & puppeteer, authorization of proxy is done in this way: await page.setExtraHTTPHeaders({ 'Proxy-Authorization': 'Basic ' + Buffer.from(':').toString('base64'), }); but it gives error:…
1
vote
0 answers

How to use crawlera proxies in selenium

I have a selenium project. I am going to use Crawlera proxy in selenium. I have already an API Key of Crawlera. headless_proxy = "127.0.0.1:3128" proxy = Proxy({ 'proxyType': ProxyType.MANUAL, 'httpProxy':…
1
vote
2 answers

Downloading Images from list of URLs (Scrapy sends 2 requests per url)

So I ran a crawler last week and produced a CSV file that lists all the image URLs I need for my project. After reading the CSV to a python list, I was unsure how to use Scrapy to simply download them through a pipeline. I've tried many things and…
Chris4542
  • 35
  • 6
1
vote
0 answers

504 Timeout Exception when using scrapy-splash with crawlera

I tried scrapy-splash with http://www.google.com and followed all the prerequisite steps given in the following Github Repo https://github.com/scrapy-plugins/scrapy-splash and i was able to render the Google page. However when i tired the same…
1
vote
1 answer

Scraping HTTPS pages using Scrapy and Crawlera

I would like to if it is possible to crawl https pages using scrapy + crawlera. So far I was using Python requests with the following settings: proxy_host = 'proxy.crawlera.com' proxy_port = '8010' proxy_auth = 'MY_KEY' proxies = { "https":…
Bociek
  • 915
  • 7
  • 19
1
vote
1 answer

Is it possible to set different settings for different request in the same Scrapy spider?

I want to use Crawlera only for some requests in a Scrapy spider. So I want to set CRAWLERA_ENABLED differently for different requests. Is it possible?
Aminah Nuraini
  • 13,849
  • 6
  • 73
  • 92
1
vote
2 answers

Connection was refused by other side: 111: Connection refused. when using Scrapy Crawlera in a linux server

Scrapy Crawlera was working just well in my Windows machine, but it gets error 111 when I run it in my linux server. Why is that? When I use curl, I got this error: curl: (7) Failed connect to proxy.crawlera.com:8010; Connection refused
Aminah Nuraini
  • 13,849
  • 6
  • 73
  • 92
0
votes
0 answers

Scrapy Cloud skipping through loop

This spider is supposed to loop through https://lihkg.com/thread/`2169007 - i*10`/page/1. But for some reason it skips pages in the loop. I looked through the item scraped in Scrapy Cloud, the items with the following urls were scraped: ... Item 10:…
shingseto
  • 1
  • 1
0
votes
1 answer

Set country while scraping Amazon

I'm scraping prices from Amazon,Everything works fine except I'm facing an issue of location,apparently some products are not available outside of US so when my program runs it fails to fetch prices, I'm using Crawlera for US IP proxy, but it still…
Shubham Devgan
  • 518
  • 4
  • 16
0
votes
1 answer

get https response from scrapy shell

I have a spider that is getting cookies from a site in the first few steps. I would like to get the cookies, start the scrape, and if the HTTP status of the current request == 302, I want to loop back to the cookies part to refresh them. How can I…
0
votes
2 answers

How to resolve 502 response code in Scrapy request?

I created a spider that scrapes data from Yelp by using Scrapy. All requests go through Crawlera proxy. Spider gets the URL to scrape from, sends a request, and scrapes the data. This worked fine up until the other day, when I started getting 502…
amabeat_30
  • 71
  • 6
0
votes
1 answer

Use a specific Scrapy downloader middleware per request

I use Crawlera as a IP rotating service to crawl a specific website which is banning my IP quickly but I have this problem only with one website out of a dozen. As it is possible to register multiple middlewares for a Scrapy project, I wanted to…
Max atton
  • 91
  • 7
1
2