Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

349 questions
0
votes
1 answer

Log for scrapyd installed with pip

I installed scrapyd with pip, and I don't have a '/var/log/scrapyd' dir. I'm trying to find out what's happening to my http call since I get and 'OK' estatus when I initiate it, but no log is generated in 'logs/project/spider/' (and according to…
Jean Ventura
  • 29
  • 10
0
votes
0 answers

scrapyd: how to override spider name using cmd arguments

I am using scrapyd (project deployed on ec2 instance of AWS) that accept seed url to start, I want to start each time run spider with different name, so that I can manage items and logs easily on ec2 instance. locally I can do like this crawl…
Tasawer Nawaz
  • 845
  • 6
  • 16
0
votes
2 answers

Scrapy recursively scraping craigslist

I am using scrapy to scrap craigslist and get all links, go to that link, store the description for each page and email for reply. Now I have written a scrapy script which gors through the craigslist/sof.com and gets all job titles and urls. I want…
Scooby
  • 2,809
  • 6
  • 33
  • 75
0
votes
1 answer

getting spider instance from scrapyd

Is there a way to get the instance of the spider that runs when you schedule a run using scrapyd? I need to access attributes in the spider to handle outside the run and can't use a json/csv file to do this.
Jean Ventura
  • 29
  • 10
0
votes
1 answer

How does scrapyd determine the 'latest' version of a project?

According to the documentations, when deploying a project to scrapyd, I can use the git commit hash as the version, by doing this: $ scrapyd-deploy default -p myproject --version GIT The documentation also says that scrapyd can keep multiple…
Kal
  • 1,552
  • 11
  • 27
0
votes
1 answer

How do I call spiders from different projects with different pipelines from a python script?

I have a three different spiders in different scrapy projects called REsale, REbuy and RErent, each with their own pipeline that directs their output to various MySQL tables on my server. They all run OK when called using scrapy crawl. Ultimately,…
Mark
  • 175
  • 1
  • 15
0
votes
1 answer

Scrapyd Post schedule.json from asp.net

I have scrapyd and spider installed on a Unix machine, and everything works fine when I run curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider I can see the job status,logs and items on web interface of scrapyd…
Syed Waqas
  • 862
  • 1
  • 9
  • 28
0
votes
1 answer

How to install the latest Scrapyd package?

I notice that the latest stable version of scrapy was released last week(2013-08-09). After updating scrapy to version 0.18, the previous installed scrapyd-0.17 was uninstalled by apt-get(Ubuntu 12.04) automatically. Is there a scrapyd-0.18? How to…
kev
  • 137,128
  • 36
  • 241
  • 259
0
votes
1 answer

How to install scrapyd on freeBSD

I am trying to install scrapyd on freeBSD but, I am getting this error: $ cd /usr/ports/www/py-scrapyd/ && sudo make install clean -bash: cd: /usr/ports/www/py-scrapyd/: No such file or directory I have installed scrapy using this command : $ cd…
Vaibhav Jain
  • 4,313
  • 8
  • 43
  • 103
0
votes
1 answer

Run Scrapy on IIS

I have an IIS server and on it I have an ASP.NET MVC application. The MVC application will revolve around Scraped data. Is there a way I can run Scrapy (a tool built in Python) on IIS? Simliar to how we can run PHP and WordPress on IIS.
J86
  • 11,751
  • 29
  • 115
  • 194
0
votes
2 answers

scrapy deploy -L returns nothing

I'm trying to deploy my scrapy project, but I'm stuck I definately do have working project and several spiders: deploy@susychoosy:~/susy_scraper$ scrapy Scrapy 0.17.0 - project: clothes_spider and when I do scrapy list it shows list of all…
pisarzp
  • 587
  • 2
  • 7
  • 12
0
votes
4 answers

empty scrapper output while individual hxs.select works?

mainfile from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector from bloggerx.items import BloggerxItem from scrapy.spider import…
Harshit
  • 1,177
  • 18
  • 38
0
votes
2 answers

Scrapy / Python and SQL Server

Is it possible to get the data scraped from websites using Scrapy, and saving that data in an Microsoft SQL Server Database? If Yes, are there any examples of this being done? Is it mainly a Python issue? i.e. if I find some code of Python saving to…
J86
  • 11,751
  • 29
  • 115
  • 194
0
votes
1 answer

Deploy scrapy project

I am trying to deploy scrapy project with scrapyd. I can run my project normally by use cd /var/www/api/scrapy/dirbot scrapy crawl dmoz This is step by step I did: 1/ I run scrapy version -v >> Scrapy : 0.16.3 lxml : 3.0.2.0 libxml2 :…
hoangvu68
  • 753
  • 12
  • 28
0
votes
1 answer

scrapyd connects to its own database(mysql.db) instead of 127.0.01:3306

I have a scrapy project whose spider is as shown below. the spider works when I run this spider with this command: scrapy crawl myspider class MySpider(BaseSpider): name = "myspider" def parse(self, response): links =…
Alican
  • 141
  • 1
  • 3
1 2 3
23
24