Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

349 questions
4
votes
1 answer

Scrapy extension: spider_closed is not called

I have an extension which attaches to spider_opened and spider_closed. The spider_opened method is correctly called, but the spider_closed method is not. I close the spider by calling the scrapyd cancel endpoint. class SpiderCtlExtension(object): …
kutschkem
  • 6,194
  • 3
  • 16
  • 43
4
votes
1 answer

Providing url for spider using scrapyd api

I tried something like: payload = {"project": settings['BOT_NAME'], "spider": crawler_name, "start_urls": ["http://www.foo.com"]} response = requests.post("http://192.168.1.41:6800/schedule.json", …
timfeirg
  • 1,106
  • 15
  • 30
4
votes
2 answers

Passing json arguments to a spider in scrapy

I should pass to a spider some parameters taken from a json file. I have read that it is possible through scrapyd using schedule.json but I don't understand how to pass the json file. Someone of you have any experience?
eng_mazzy
  • 1,039
  • 4
  • 22
  • 35
4
votes
1 answer

Scrapyd cant find the project name

I am getting an error when I try to run an existing scrapy project on scrapyd. I have a working scrapy project (url_finder) and a working spider in that project used for test purpose (test_ip_spider_1x) that simply downloads whatismyip.com. I…
gpanterov
  • 1,145
  • 2
  • 13
  • 24
4
votes
2 answers

How do I pass form data with Scrapy from the command line?

How could I pass username and password from the command line? Thanks! class LoginSpider(Spider): name = 'example.com' start_urls = ['http://www.example.com/users/login.php'] def parse(self, response): return…
Theodis Butler
  • 126
  • 1
  • 8
4
votes
1 answer

Scrapyd: How to set scrapyd task priority?

I have several scrapy projects. I deploy all of them with scrapyd. Some of the spiders are slow while others are fast. Now, I want to run fast spiders first. How to do?
Zhang Jiuzhou
  • 663
  • 7
  • 19
4
votes
1 answer

How to set scrapy IMAGES_STORE relative path

I am trying to set IMAGES_STORE as a relative path but i am getting error and if i am specifying IMAGES_STORE as a Full path it is working fine /home/vaibhav/scrapyprog/comparison/eScraperInterface/images Error i am getting is at link Actually it…
Vaibhav Jain
  • 4,313
  • 8
  • 43
  • 103
4
votes
1 answer

Enabling HttpProxyMiddleware in scrapyd

After reading the scrapy documentation, I thought that the HttpProxyMiddleware is enabled by default. But when I start a spider via scrapyd's webservice interface, HttpProxyMiddleware is not enabled. I receive the following output: 2013-02-18…
digitalmonkey
  • 149
  • 1
  • 8
3
votes
1 answer

Scrapy server setup

Im trying to setup a scrapyd server on AWS and am trying to access it from my local machine. So far, ive managed to get scrapyd running on the remote machine. I know its running because when I do start scrapyd i get start: Job is already running:…
zsquare
  • 9,272
  • 5
  • 48
  • 84
3
votes
1 answer

Scrapy spider not working on Django after implementing WebSockets with Channels (cannot call it from an async context)

I'm opening a new question as I'm having an issue with Scrapy and Channels in a Django application and I would appreciate if someone could guide me in the right direction. The reason why I'm using channels is because I want to retrieve in real-time…
Askew
  • 71
  • 8
3
votes
1 answer

How to fix scrapy.utils.http deprecated warning

I am getting depreciation error while trying to scrapy deploy. Pretty new at this scraping. deploy.py:23: ScrapyDeprecationWarning: Module scrapy.utils.http is deprecated, Please import from `w3lib.http nstead. from scrapy.utils.http import…
Marshall
  • 37
  • 6
3
votes
0 answers

How to add a new service to scrapyd from current project

I am trying to run multiple spiders at once and I made my own custom command in scrapy. Now I am trying to run that command through srapyd. I tried to add it as a new service to my scrapd.conf but it throws an error saying there is no such…
3
votes
1 answer

Scapyd raise NotADirectoryError from .egg file

I use Scrapyd for run my spider dynamically. I add .txt file that has a list of block words. My problem is following: When I run Scrapyd server as a daemon it raised the error during scrapping: NotADirectoryError: [Errno 20] Not a directory:…
amarynets
  • 1,666
  • 7
  • 20
3
votes
1 answer

TypeError in scrapyd

I have started scrapyd in my cmd, and the website "localhost:8600" shows normally. Then I began to deploy a project named scrapyd_prac, and changed the content of project's "scrapy.cfg' as: [deploy:localhost] url = http://localhost:6800/ …
Eva Frost
  • 31
  • 2
3
votes
2 answers

Use scrapyd job id in scrapy pipelines

I've implemented a web application that is triggering scrapy spiders using scrapyd API (web app and scrapyd are running on the same server). My web application is storing job ids returned from scrapyd in DB. My spiders are storing items in…
mouch
  • 195
  • 1
  • 10
1 2
3
23 24