Highest Voted 'scrapy-pipeline' Questions

17

votes

1 answer

Scrapy: how to use items in spider and how to send items to pipelines?

I am new to scrapy and my task is simple: For a given e-commerce website: crawl all website pages look for products page If the URL point to a product page Create an Item Process the item to store it in a database I created the spider but…

asked May 11 '17 at 17:58

farhawa

8,406
16
37
80

5

votes

2 answers

Scrapy file download how to use custom filename

For my scrapy project I'm currently using the FilesPipeline. The downloaded files are stored with a SHA1 hash of their URLs as the file names. [(True, {'checksum': '2b00042f7481c7b056c4b410d28f33cf', 'path':…

python scrapy scrapy-spider scrapy-pipeline

asked Oct 31 '17 at 08:58

Michael

1,950
1
29
46

5

votes

1 answer

Django Relations with Scrapy how are items saved?

I just need to understand How can I detect whether scrapy saved and item in spider ? I'm fetching items from a site and after that I'm fetching comments on that item. So first I have to save the item after that I'll save comments. But when I'm…

python django scrapy scrapy-spider scrapy-pipeline

asked Jan 03 '17 at 16:51

Murat Kaya

1,143
1
26
49

5

votes

1 answer

Scrapy, make http request in pipeline

Assume I have an scraped item that looks like this { name: "Foo", country: "US", url: "http://..." } In a pipeline I want to make a GET request to the url and check some headers like content_type and status. When the headers do not meet…

scrapy scrapy-pipeline

asked Jul 19 '16 at 19:33

Upvote

65,847
122
353
577

5

votes

0 answers

Twisted (Scrapy) and Postgres

Im using Scrapy (aka Twisted) and also Postgres as a database. After I while my connections seem to fill up and then my script is been stuck. I checked this with this query SELECT * FROM pg_stat_activity; and read that its caused because Postgres…

postgresql twisted psycopg2 scrapy-pipeline

asked Feb 17 '16 at 00:06

lony

5,002
6
50
71

5

votes

2 answers

scrapy - handling multiple types of items - multiple and related Django models and saving them to database in pipelines

I have the following Django models. I am not sure what is the best way to save these inter-related objects when scanned in spider to the database in Django using scrapy pipelines. Seems like scrapy pipeline was built to handle only one 'kind' of…

python django scrapy scrapy-spider scrapy-pipeline

asked Nov 10 '15 at 19:26

dowjones123

3,233
5
34
72

5

votes

1 answer

When saving scraped item and file, Scrapy inserts empty lines in output csv file

I have Scrapy (version 1.0.3) spider in which I extract both some data from web page and I also download file, like this (simplified): def extract_data(self, response): title = response.xpath('//html/head/title/text()').extract()[0].strip() …

python scrapy scrapy-spider scrapy-pipeline

asked Oct 14 '15 at 13:10

zdenulo

214
2
11

5

votes

1 answer

Closing database connection from pipeline and middleware in Scrapy

I have a Scrapy project that uses custom middleware and a custom pipeline to check and store entries in a Postgres DB. The middleware looks a bit like this: class ExistingLinkCheckMiddleware(object): def __init__(self): ... open…

python web-scraping scrapy scrapy-pipeline

asked May 23 '13 at 10:10

Jamie Brown

903
9
12

4

votes

1 answer

Custom Files Pipeline in Scrapy never downloads Files even though logs should all functions being accessed

I have the following custom pipeline for downloading JSON files. It was functioning fine until I need to add the __init__ function, in which I subclass the FilesPipeline class in order to add a few new properties. The pipeline takes URLs that are to…

python python-3.x scrapy scrapy-pipeline

asked Jul 17 '20 at 23:47

CaffeinatedMike

1,456
2
22
43

4

votes

1 answer

Export scrapy items to different files

I'm scraping review from moocs likes this one From there I'm getting all the course details, 5 items and another 6 items from each review itself. This is the code I have for the course details: def parse_reviews(self, response): l =…

python scrapy scrapy-pipeline

asked Apr 29 '18 at 05:20

Luis Ramon Ramirez Rodriguez

6,361
20
65
123

4

votes

2 answers

Scrapy store returned items in variables to use in main script

I am quite new to Scrapy and want to try the following: Extract some values from a webpage, store it in a variable and use it in my main script. Therefore I followed their tutorial and changed code for my purposes: import scrapy from scrapy.crawler…

python web-scraping scrapy scrapy-spider scrapy-pipeline

asked Dec 27 '17 at 13:51

MaGi

89
1
1
7

4

votes

1 answer

Scrapy Pipelines to Seperate Folder/Files - Abstraction

I currently finalising a Scrapy project however I have quite a lengthy pipelines.py file. I noticed that in my settings.py the pipe lines are show as follows (trimmed down): ITEM_PIPELINES = { 'proj.pipelines.MutatorPipeline': 200, …

python scrapy scrapy-pipeline

asked Jun 01 '17 at 08:17

Matt The Ninja

2,389
2
22
48

4

votes

1 answer

How to download image using Scrapy?

I am newbie to scrapy. I am trying to download an image from here. I was following Official-Doc and this article. My settings.py looks like: BOT_NAME = 'shopclues' SPIDER_MODULES = ['shopclues.spiders'] NEWSPIDER_MODULE =…

scrapy scrapy-spider scrapy-pipeline

asked Sep 28 '16 at 11:05

Prashant Prabhakar Singh

871
4
11
30

4

votes

2 answers

Understand the scrapy framework architecture

Recently, I've been trying to get to grips with scrapy. I feel if I had a better understanding to the architecture, I'd move a lot faster. The current, concrete problem I have this: I want to store all of the links that scrapy extracts in a…

python scrapy scrapy-spider scrapy-pipeline

asked Dec 16 '15 at 12:10

user3185563

1,103
2
10
13

3

votes

1 answer

Pass file_name argument to pipeline for csv export in scrapy

I need scrapy to take an argument (-a FILE_NAME="stuff") from the command line and apply that to the file created in my CSVWriterPipeLine in pipelines.py file. (The reason I went with pipeline.py was that the built in exporter was repeating data…

python web-scraping scrapy scrapy-spider scrapy-pipeline

asked Jul 20 '15 at 22:00

Josh Usre

619
1
9
30

Questions tagged [scrapy-pipeline]