I'm looking for a simple tutorial explaining how to write items to Rethinkdb from scrapy. The equivalent can be found for MongoDB here.
Asked
Active
Viewed 150 times
1 Answers
2
Here is a translation of "Write items to MongoDB" line for line with RethinkDB.
A couple notes:
- I'm not sure where
crawler.settings
are set. - The scrapy docs say
process_item
's second paramitem
can be an object ordict
, so the.insert(dict(item))
cast/conversion is probably necessary.
import rethinkdb as r
class RethinkDBPipeline(object):
table_name = 'scrapy_items'
def __init__(self, rethinkdb_uri, rethinkdb_port, rethinkdb_db):
self.rethinkdb_uri = rethinkdb_uri
self.rethinkdb_port = rethinkdb_port
self.rethinkdb_db = rethinkdb_db
@classmethod
def from_crawler(cls, crawler):
return cls(
rethinkdb_uri=crawler.settings.get('RETHINKDB_URI'),
rethinkdb_db=crawler.settings.get('RETHINKDB_DATABASE', 'items')
)
def open_spider(self, spider):
self.conn = r.connect(
host = self.rethinkdb_uri,
port = self.rethinkdb_port,
db = self.rethinkdb_db)
def close_spider(self, spider):
self.conn.close()
def process_item(self, item, spider):
r.table(self.table_name).insert(dict(item)).run(self.conn)
return item
![](../../users/profiles/1076057.webp)
dalanmiller
- 2,897
- 4
- 24
- 35
-
thank you for you code, unfortunately I was unable to implement it. I think I have to deepen my understanding of RethinkDB first... Crawler settings are set in setting.py – crocefisso Apr 22 '16 at 23:27
-
@crocefisso, let me know if this works eventually I'd love to post something showing how to setup RethinkDB with scrapy based on this! – dalanmiller Apr 25 '16 at 17:30
-
@dalanmiler, I'm currently studying data science and I'm learning Python, Scrapy and RethinkDB from scratch since I have no background in computing (exept hobbying). For some class I strated a project using RethinkDB and getting my data from scrapy. As time was limited I did not have the time to implement a pipeline in scrapy storing items to RinthinkDB as I wanted. I just ended using rethinkdb import function. But for my next project (in few weeks) I'm planning to try again, and study more thoroughly the question. As soon as I'm into it again I'll share my finding with you. – crocefisso Apr 25 '16 at 20:05
-
Great to hear @crocefisso! Check out http://slack.rethinkdb.com to join our Slack channel if you want some more immediate help. – dalanmiller Apr 26 '16 at 15:07
-
1In addition to RETHINKDB_URI and RETHINKDB_DATABASE, I added a RETHINKDB_PORT setting to my RethinkDBPipeline and it worked great. Also - a side note - if you are using the conda package manager rethinkdb isn't available.. I just copied rethinkdb out of the Site-Packages directory from a default Python (3.5) distribution into the Miniconda Site-Packages directory and it had no issues. – zulumojo Jul 06 '16 at 17:56