Questions tagged [scrapy-pipeline]

193 questions
2
votes
1 answer

Right way to scrape this noisy price tag

Given a
containing a price with a lot of noise: Price 1\u00a0500\u00a0000 EUR and you want only the pure amount (1500000), what is the best way to implement this in Scrapy? I tried to combine regex: il.add_css('price', 'div.price_tag::text',…
szeta
  • 571
  • 1
  • 3
  • 18
2
votes
3 answers

Scrapy: How to access the custom, CLI passed settings from the __init__() method of a spider class?

I need to access the custom settings passed from the CLI using: -s SETTING_NAME="SETTING_VAL" from the __init__() method of the spider class. get_project_settings() allows me to access only the static settings. The docs explain how you can access…
Nikolay Shindarov
  • 1,051
  • 2
  • 9
  • 20
2
votes
1 answer

Scrapinghub plugs my results in the log and not in item

I have a functioning spider project to extract urls content (no css). I crawled several set of data and stored them in a series of .csv files. Now I try to set it up to work on Scrapinghub in order to go for a long run scraping. So far, I am able to…
2
votes
0 answers

scrapy.pipeline ImagePipeline func file_path . When I return str

The function file_path parameter in imagepipiline responds. When I return str directly, I can successfully download the image. If response.meta.get('file_name') is used, the download will fail. It is also a string. Why can't the variable be…
S.DZ
  • 21
  • 1
2
votes
3 answers

.json export formating in Scrapy

Just a quick question about json export formatting in Scrapy. My exported file looks like this. {"pages": {"title": "x", "text": "x", "tags": "x", "url": "x"}} {"pages": {"title": "x", "text": "x", "tags": "x", "url": "x"}} {"pages": {"title": "x",…
2
votes
1 answer

Scrapy error: 'Pipeline' object has no attribute 'exporter'

I made a scraper and am using this tutorial to export using a pipeline. When I run scrapy crawl [myspider] I see the objects flashing by in my terminal, but after each it gives the error 'PostPipeline' object has no attribute 'exporter'. My…
Teresa
  • 193
  • 1
  • 2
  • 20
2
votes
1 answer

Return image contents by Scrapy-Splash

I'm using Scrapy-Splash requests to get a rendered screenshot of a page, but I also need the images on that page. I use the pipelines to download those images, but I was thinking - does this not make two requests for the same image? Once when Splash…
2
votes
3 answers

Scrapy Pipeline doesn't insert into MySQL

I'm trying to build a small app for a university project with Scrapy. The spider is scraping the items, but my pipeline is not inserting data into mysql database. In order to test whether the pipeline is not working or the pymysl implementation is…
2
votes
1 answer

Scrapy not calling the assigned pipeline when run from a script

I have a piece of code to test scrapy. My goal is to use scrapy without having to call the scrapy command from the terminal, so I can embed this code somewhere else. The code is the following: from scrapy import Spider from scrapy.selector import…
Santi Peñate-Vera
  • 1,553
  • 3
  • 27
  • 59
2
votes
1 answer

Check if id exists in MongoDB with pymongo and scrapy

I have set up a spider with scrapy that sends data to a MongoDB database. I want to check to see if the id exists so that if it does I can $addToSet on a specific key (otherwise Mongo will reject the insert because the _id already exists). This is…
Eitan
  • 163
  • 1
  • 3
  • 12
1
vote
0 answers

Save downloaded files with custom names in scrapy

I am new to scrapy.I downloaded some files using the code bellow. I want to change the names of my downloaded files but I don't know how. For example, I want to have a list containing names and use it to rename the files that I downloaded. Any help…
Pito
  • 11
  • 1
1
vote
1 answer

How to run multiple spiders through individual pipelines?

Total noob just getting started with scrapy. In my directory structure I have like this... #FYI: running on Scrapy 2.4.1 WebScraper/ Webscraper/ spiders/ spider.py # (NOTE: contains spider1 and spider2 classes.) items.py …
yeqiuuu
  • 47
  • 1
  • 6
1
vote
0 answers

Exception raised file_path function in Scrapy Pipeline not showed

So, I put a simple exception in an image pipeline like this: class MyImagesPipeline(ImagesPipeline): #Name download version def file_path(self, request, response=None, info=None): raise Exception() print("It get's into…
Aminah Nuraini
  • 13,849
  • 6
  • 73
  • 92
1
vote
0 answers

CsvItemExporter for multiple files in custom item pipeline not exporting all items

I have created an item pipeline as an answer to this question. It is supposed to create a new file for every page according to the page_no value set in the item. This works mostly fine. The problem is with the last csv file generated by the…
Patrick Klein
  • 694
  • 6
  • 17
1
vote
2 answers

Organizing scraped data based on the url on the data came from

I am creating a Scrapy program to scrape profile pages for numerical data. Each profile has a section that lists the different tags that the user uses. Each of those tags link to a paginated set of pages that shows all the posts made under that tag,…
harada
  • 163
  • 1
  • 9
1
2
3
12 13