Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

349 questions

votes

1 answer

Log for scrapyd installed with pip

I installed scrapyd with pip, and I don't have a '/var/log/scrapyd' dir. I'm trying to find out what's happening to my http call since I get and 'OK' estatus when I initiate it, but no log is generated in 'logs/project/spider/' (and according to…

python scrapy scrapyd

asked Dec 26 '13 at 12:51

Jean Ventura

votes

0 answers

scrapyd: how to override spider name using cmd arguments

I am using scrapyd (project deployed on ec2 instance of AWS) that accept seed url to start, I want to start each time run spider with different name, so that I can manage items and logs easily on ec2 instance. locally I can do like this crawl…

amazon-ec2 web-scraping scrapyd

asked Dec 21 '13 at 07:36

Tasawer Nawaz

votes

2 answers

Scrapy recursively scraping craigslist

I am using scrapy to scrap craigslist and get all links, go to that link, store the description for each page and email for reply. Now I have written a scrapy script which gors through the craigslist/sof.com and gets all job titles and urls. I want…

python scrapy scrapyd

asked Nov 26 '13 at 02:07

Scooby

2,809
6
33
75

votes

1 answer

getting spider instance from scrapyd

Is there a way to get the instance of the spider that runs when you schedule a run using scrapyd? I need to access attributes in the spider to handle outside the run and can't use a json/csv file to do this.

python scrapy scrapyd

asked Nov 18 '13 at 20:22

Jean Ventura

votes

1 answer

How does scrapyd determine the 'latest' version of a project?

According to the documentations, when deploying a project to scrapyd, I can use the git commit hash as the version, by doing this: $ scrapyd-deploy default -p myproject --version GIT The documentation also says that scrapyd can keep multiple…

scrapy scrapyd

asked Nov 14 '13 at 03:51

Kal

1,552
11
27

votes

1 answer

How do I call spiders from different projects with different pipelines from a python script?

I have a three different spiders in different scrapy projects called REsale, REbuy and RErent, each with their own pipeline that directs their output to various MySQL tables on my server. They all run OK when called using scrapy crawl. Ultimately,…

python api windows-7 scrapy scrapyd

asked Nov 10 '13 at 03:05

Mark

votes

1 answer

Scrapyd Post schedule.json from asp.net

I have scrapyd and spider installed on a Unix machine, and everything works fine when I run curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider I can see the job status,logs and items on web interface of scrapyd…

c# asp.net scrapy scrapyd

asked Nov 07 '13 at 07:12

Syed Waqas

votes

1 answer

How to install the latest Scrapyd package?

I notice that the latest stable version of scrapy was released last week(2013-08-09). After updating scrapy to version 0.18, the previous installed scrapyd-0.17 was uninstalled by apt-get(Ubuntu 12.04) automatically. Is there a scrapyd-0.18? How to…

python ubuntu-12.04 scrapy apt-get scrapyd

asked Aug 12 '13 at 02:06

kev

137,128
36
241
259

votes

1 answer

How to install scrapyd on freeBSD

I am trying to install scrapyd on freeBSD but, I am getting this error: $ cd /usr/ports/www/py-scrapyd/ && sudo make install clean -bash: cd: /usr/ports/www/py-scrapyd/: No such file or directory I have installed scrapy using this command : $ cd…

python scrapy freebsd scrapyd

asked Jun 10 '13 at 08:39

Vaibhav Jain

4,313
8
43
103

votes

1 answer

Run Scrapy on IIS

I have an IIS server and on it I have an ASP.NET MVC application. The MVC application will revolve around Scraped data. Is there a way I can run Scrapy (a tool built in Python) on IIS? Simliar to how we can run PHP and WordPress on IIS.

iis scrapy scrapyd

asked Mar 24 '13 at 18:39

J86

11,751
29
115
194

votes

2 answers

scrapy deploy -L returns nothing

I'm trying to deploy my scrapy project, but I'm stuck I definately do have working project and several spiders: deploy@susychoosy:~/susy_scraper$ scrapy Scrapy 0.17.0 - project: clothes_spider and when I do scrapy list it shows list of all…

python-2.7 scrapy scrapyd

asked Mar 09 '13 at 00:08

pisarzp

votes

4 answers

empty scrapper output while individual hxs.select works?

mainfile from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector from bloggerx.items import BloggerxItem from scrapy.spider import…

python web-crawler scrapy scrapyd

asked Mar 07 '13 at 11:12

Harshit

1,177
18
38

votes

2 answers

Scrapy / Python and SQL Server

Is it possible to get the data scraped from websites using Scrapy, and saving that data in an Microsoft SQL Server Database? If Yes, are there any examples of this being done? Is it mainly a Python issue? i.e. if I find some code of Python saving to…

sql-server scrapy scrapyd

asked Feb 07 '13 at 01:16

J86

11,751
29
115
194

votes

1 answer

Deploy scrapy project

I am trying to deploy scrapy project with scrapyd. I can run my project normally by use cd /var/www/api/scrapy/dirbot scrapy crawl dmoz This is step by step I did: 1/ I run scrapy version -v >> Scrapy : 0.16.3 lxml : 3.0.2.0 libxml2 :…

python deployment scrapy scrapyd

asked Jan 19 '13 at 12:12

hoangvu68

votes

1 answer

scrapyd connects to its own database(mysql.db) instead of 127.0.01:3306

I have a scrapy project whose spider is as shown below. the spider works when I run this spider with this command: scrapy crawl myspider class MySpider(BaseSpider): name = "myspider" def parse(self, response): links =…

database-connection scrapy scrapyd

asked Oct 14 '12 at 19:48

Alican

Prev 1 2 3

…

24 Next