Highest Voted 'scrapinghub' Questions

11

votes

1 answer

Not able Running/deploying custom script with shub-image

I have problem for Running/deploying custom script with shub-image. setup.py from setuptools import setup, find_packages setup( name = 'EU-Crawler', version = '1.0', packages = find_packages(), scripts = [ …

python scrapy scrapinghub

asked Dec 07 '17 at 16:24

parik

1,924
10
36
62

7

votes

4 answers

scrapy passing custom_settings to spider from script using CrawlerProcess.crawl()

I am trying to programatically call a spider through a script. I an unable to override the settings through the constructor using CrawlerProcess. Let me illustrate this with the default spider for scraping quotes from the official scrapy site (last…

python-3.x web-scraping scrapy scrapy-spider scrapinghub

asked Feb 28 '17 at 14:48

hAcKnRoCk

942
3
11
27

6

votes

1 answer

Scrapy hidden memory leak

Background - TLDR: I have a memory leak in my project Spent a few days looking through the memory leak docs with scrapy and can't find the problem. I'm developing a medium size scrapy project, ~40k requests per day. I am hosting this using…

python memory scrapy scrapinghub

asked Sep 17 '20 at 11:08

Hector Haffenden

1,128
6
22

6

votes

0 answers

Pygsheets unable to find the server at www.googleapis.com

I'm trying to use pygsheets in a script on ScrapingHub. The pygsheets part of the script begins with: google_client = pygsheets.authorize(service_file=CREDENTIALS_FILENAME, no_cache=True) spreadsheet = google_client.open_by_key(SHEET_ID) Where…

python google-sheets google-sheets-api scrapinghub pygsheets

asked Jan 30 '18 at 18:03

osjerick

566
1
5
19

6

votes

0 answers

Scrapy concurrent requests with stateful sessions

I've been web scraping for some time but relatively new to python, have recently switched all my scraping activity from ruby over to python primarily because of scrapy and scrapinghub which seem to provide better support for large-scale…

python web-scraping scrapy session-cookies scrapinghub

asked Dec 31 '17 at 13:22

acowpy

306
3
7

5

votes

1 answer

Scrapy does not fetch markup on response.css

I've built a simple scrapy spider running on scrapinghub: class ExtractionSpider(scrapy.Spider): name = "extraction" allowed_domains = ['domain'] start_urls = ['http://somedomainstart'] user_agent = "Mozilla/5.0 (Windows NT 10.0;…

python web-scraping scrapy scrapinghub splash-js-render

asked Aug 27 '19 at 15:37

qubits

925
2
14
39

4

votes

1 answer

scrapy how to load urls from file at scrapinghub

I know how to load data into Scrapy spider from external source when working localy. But I strugle to find any info on how to deploy this file to scrapinghub and what path to use there. Now i use this approach from SH documentation - enter link…

scrapy scrapinghub

asked Aug 09 '17 at 09:01

Billy Jhon

879
12
23

3

votes

0 answers

Splash - Scrapy - HAR data

In general I understand how to work with Scrapy and x-path to parse the html. However, I don't know how to grab the HAR data. mport scrapy from scrapy_splash import SplashRequest class QuotesSpider(scrapy.Spider): name = 'quotes' …

python scrapy scrapy-splash scrapinghub splash-js-render

asked Jan 17 '20 at 13:23

Zach

371
1
3
9

3

votes

1 answer

Why is scrapy with crawlera running so slow?

I am using scrapy 1.7.3 with crawlera (C100 plan from scrapinghub) and python 3.6. When running the spider with crawlera enabled I get about 20 - 40 items per minute. Without crawlera I get 750 - 1000 (but I get banned quickly of course). Have I…

python scrapy scrapinghub crawlera

asked Aug 03 '19 at 17:29

Wramana

151
2
15

3

votes

1 answer

Use splash from scrapinghub scraping hub locally

I got a suscriptions for splash on scrapinghub and I want to use this from a script that is running on my local machine. The instrucctions I have foud so far are: 1) Edits the settings file: #I got this one from my scraping hub account SPLASH_URL =…

python scrapy scrapy-splash scrapinghub splash-js-render

asked Jul 13 '19 at 22:57

Luis Ramon Ramirez Rodriguez

6,361
20
65
123

3

votes

1 answer

ScrapingHub Environment Variables Not Loaded

I'm deploying a bunch of spiders on ScrapingHub. The spider itself is working. I would like to change the feed output depending on whether the spider is running locally or on ScrapingHub (if it is running locally then output to a temp folder, if it…

python amazon-s3 scrapy scrapinghub

asked Jun 18 '19 at 03:41

Ze Xuan

56
6

3

votes

1 answer

scrapinghub starting job too slow

I am new in scraping and I am running different jobs on scrapinghub. I run them via their API. The problem is that starting the spider and initializing it takes too much time like 30 seconds. When I run it locally, it takes up to 5 seconds to finish…

scrapy scrapinghub

asked May 22 '19 at 05:30

Mara M

153
1
1
10

3

votes

2 answers

Scrapy and Splash times out for a specific site

I have an issue with Scrapy, Crawlera and Splash when trying the fetch responses from this site. I tried the following without luck: pure Scrapy shell - times out Scrapy + Crawlera - times out Scrapinghub Splash instance (small) - times…

web-scraping scrapy scrapy-splash scrapinghub splash-js-render

asked Jan 18 '18 at 13:11

Szabolcs

3,041
13
31

3

votes

2 answers

Download project's source-code from Scrapinghub

I have a project deployed on Scrapinghub, I do not have any copy of that code at all. How can I download the whole project's code on my localhost from Scrapinghub?

python scrapy scrapinghub

asked Jul 27 '17 at 16:17

Umair Ayub

13,220
12
53
124

3

votes

2 answers

How to install xvfb on Scrapinghub for using Selenium?

I use Python-Selenium in my spider (Scrapy), for using Selenium i should install xvfb on Scrapinghub. when i use apt-get for installing xvfb i have this error message: E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied) …

selenium selenium-webdriver scrapy xvfb scrapinghub

asked Jun 09 '17 at 15:17

parik

1,924
10
36
62

Questions tagged [scrapinghub]