What's the best practice for using kedro with MongoDB or other document databases? MongoDB, for example, doesn't have a query language analogous to SQL. Most Mongo "queries" in Python (using PyMongo) will look something like this:
from pymongo import MongoClient
client = MongoClient(...) // Credentials go here
posts = client.test_database.posts
posts.find_one({"author": "Mike"})
And then you'll get something back like this:
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'mongodb', u'python', u'pymongo'],
u'text': u'My first blog post!'}
Now my question is: where should the logic go to find this post and then parse it into a dataframe? It doesn't seem appropriate try to create a MongoQueryDataSet
class because you'll end up having to wrap the entire PyMongo API with clunky yaml arguments if you want to support things like inserts, aggregations, etc.
Should a MongoDataSet
class just return a MongoClient
object and capture any further logic in a kedro node?
In general, where should data loading logic live when you're working with databases that have these functional (non-SQL) APIs without simple query strings?