Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

349 questions

votes

5 answers

Scrapy get request url in parse

How can I get the request url in Scrapy's parse() function? I have a lot of urls in start_urls and some of them redirect my spider to homepage and as result I have an empty item. So I need something like item['start_url'] = request.url to store…

python-2.7 scrapy scrapyd

asked Nov 19 '13 at 20:07

Goran

5,427
9
30
51

votes

1 answer

ScrapyRT vs Scrapyd

We've been using Scrapyd service for a while up until now. It provides a nice wrapper around a scrapy project and its spiders letting to control the spiders via an HTTP API: Scrapyd is a service for running Scrapy spiders. It allows you to deploy…

python web-scraping scrapy scrapyd

asked May 17 '16 at 18:16

alecxe

414,977
106
935
1,083

votes

4 answers

How to setup and launch a Scrapy spider programmatically (urls and settings)

I've written a working crawler using scrapy, now I want to control it through a Django webapp, that is to say: Set 1 or several start_urls Set 1 or several allowed_domains Set settings values Start the spider Stop / pause / resume a…

python scrapy scrapyd

asked Oct 21 '12 at 10:10

arno

votes

1 answer

learning python and also trying to implement scrapy ..getting this error

I am going through the scrapy tutorial http://doc.scrapy.org/en/latest/intro/tutorial.html and I followed it till I ran this command scrapy crawl dmoz And it gave me output with an error 2013-08-25 13:11:42-0700 [scrapy] INFO: Scrapy 0.18.0 started…

python scrapy scrapyd

asked Aug 25 '13 at 20:16

Asim Zaidi

23,590
46
125
213

votes

1 answer

Scrapy spider memory leak

My spider have a serious memory leak.. After 15 min of run its memory 5gb and scrapy tells (using prefs() ) that there 900k requests objects and thats all. What can be the reason for this high number of living requests objects? Request only goes up…

python memory-leaks scrapy scrapyd

asked Jul 23 '15 at 17:19

Aldarund

14,747
4
55
86

votes

3 answers

Running Multiple Scrapy Spiders (the easy way) Python

Scrapy is pretty cool, however I found the documentation to very bare bones, and some simple questions were tough to answer. After putting together various techniques from various stackoverflows I have finally come up with an easy and not overly…

python scrapy scrapyd

asked Jan 25 '14 at 00:47

InfinteScroll

votes

2 answers

Run multiple scrapy spiders at once using scrapyd

I'm using scrapy for a project where I want to scrape a number of sites - possibly hundreds - and I have to write a specific spider for each site. I can schedule one spider in a project deployed to scrapyd using: curl…

python screen-scraping scrapy scrapyd

asked May 29 '12 at 14:23

user1009453

votes

2 answers

Parallelism/Performance problems with Scrapyd and single spider

Context I am running scrapyd 1.1 + scrapy 0.24.6 with a single "selenium-scrapy hybrid" spider that crawls over many domains according to parameters. The development machine that host scrapyd's instance(s?) is an OSX Yosemite with 4 cores and this…

python scrapy twisted scrapyd

asked Jun 05 '15 at 17:56

gerosalesc

2,613
3
21
41

votes

1 answer

what are the advantages use scrapyd?

The scrapy doc says that: Scrapy comes with a built-in service, called “Scrapyd”, which allows you to deploy (aka. upload) your projects and control their spiders using a JSON web service. is there some advantages in comformance use scrapyd?

scrapy scrapyd

asked Apr 16 '13 at 10:19

gnemoug

votes

1 answer

scrapyd-client command not found

I'd just installed the scrapyd-client(1.1.0) in a virtualenv, and run command 'scrapyd-deploy' successfully, but when I run 'scrapyd-client', the terminal said: command not found: scrapyd-client. According to the readme…

python scrapy web-crawler scrapyd

asked Aug 18 '17 at 07:19

dropax

votes

3 answers

Scrapyd jobid value inside spider

Framework Scrapy - Scrapyd server. I have some problem with getting jobid value inside the spider. After post data to http://localhost:6800/schedule.json the response is status = ok jobid = bc2096406b3011e1a2d0005056c00008 But I need use this…

python scrapy scrapyd

asked Mar 11 '12 at 04:28

fcmax

votes

2 answers

Scrapy 's Scrapyd too slow with scheduling spiders

I am running Scrapyd and encounter a weird issue when launching 4 spiders at the same time. 2012-02-06 15:27:17+0100 [HTTPChannel,0,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-"…

python scrapy scrapyd

asked Feb 06 '12 at 14:34

Sjaak Trekhaak

4,596
27
36

votes

2 answers

Scrapyd-deploy command not found after scrapyd installation

I have created a couple of web spiders that I intend to run simultaneously with scrapyd. I first successfully installed scrapyd in Ubuntu 14.04 using the command: pip install scrapyd, and when I run the command: scrapyd, I get the following output…

python web-scraping scrapy twisted scrapyd

asked Jul 14 '15 at 05:31

loremIpsum1771

2,277
3
29
68

votes

0 answers

scrapyd: is it possible to return ERROR status for a job

I have an application which schedules scrapy crawl jobs via scrapyd. Items flow nicely to the DB, and I can monior the job status via the listjobs.json endpoint.So far so good, and I can even know when jobs are finished. However, sometimes jobs can…

python scrapy scrapyd

asked Mar 03 '16 at 16:44

Oren Yosifon

votes

1 answer

Horizontally scaling Scrapyd

What tool or set of tools would you use for horizontally scaling scrapyd adding new machines to a scrapyd cluster dynamically and having N instances per machine if required. Is not neccesary for all the instances to share a common job queue, but…

python scrapy scrapyd horizontal-scaling

asked Jul 24 '15 at 18:39

gerosalesc

2,613
3
21
41

2 3

…

23 24 Next