0

Here is my sample mongodb database

database image for one object

The above is a database with an array of articles. I fetched only one object for simplicity purposes.

database image for multiple objects ( max 20 as it's the size limit )

I have about 18k such entries. I have to extract the description and title tags present inside the (articles and 0) subsections. The find() method is the question here.. i have tried this :

for i in db.ncollec.find({'status':"ok"}, { 'articles.0.title' : 1 , 'articles.0.description' : 1}):
    for j in i:
        save.write(j)

After executing the code, the file save has this :

_id
articles
_id
articles

and it goes on and on..

Any help on how to print what i stated above?

My entire code for reference :

    import json
    import newsapi
    from newsapi import NewsApiClient
    import pymongo
    from pymongo import MongoClient

    client = MongoClient()
    db = client.dbasenews
    ncollec = db.ncollec


    newsapi = NewsApiClient(api_key='**********')
    source = open('TextsExtractedTemp.txt', 'r')
    destination = open('NewsExtracteddict.txt', "w")
    for word in source:
        if word == '\n':
            continue
        all_articles = newsapi.get_everything(q=word, language='en', page_size=1)
        print(all_articles)
        json.dump(all_articles, destination)
        destination.write("\n")
        try:
            ncollec.insert(all_articles)
        except:
            pass
  • Can you do : `for i in db.ncollect.find({'status' : 'ok'}): print i` ? and show me here the 2/3 first entries. – IMCoins Mar 01 '18 at 14:21
  • Yeah sure! I'll do it now. – IncyWincyRz Mar 01 '18 at 14:24
  • There's no output. It's blank .. Isn't it supposed to print all the records which have their status set to 'ok'? – IncyWincyRz Mar 01 '18 at 14:30
  • yeps. Hmm. I'm confused as I can't test your database. If you had a small example as to how to set this db, it would be easy for me to debug it. Try the following code in your first `for` loop ---> `for j in i['articles']: save.write(j[0]['title']) save.write(j[0]['description'])` – IMCoins Mar 01 '18 at 14:34
  • I set up the database using google api. I updated the answer to include my entire code. – IncyWincyRz Mar 01 '18 at 15:33
  • I executed the for loop you provided inside my first for loop. Output : Traceback (most recent call last): File "blablabla", line 12, in save.write(j[0]['title']) KeyError: 0 – IncyWincyRz Mar 01 '18 at 15:46
  • Your query might be poorly formulated, try : `{'status':"ok"}, { 'articles.0.title' : '$exists' , 'articles.0.description' : '$exists'}`. I shouldn't produce anything anyways since `{ 'status' : 'ok' }` didn't return anything. – IMCoins Mar 01 '18 at 15:48

1 Answers1

1

Okay, so I checked a little to update my rusty memory of pymongo, and here is what I found.

The correct query should be :

db.ncollec.find({ 'status':"ok", 
                  'articles.title' : { '$exists' : 'True' },
                  'articles.description' : { '$exists' : 'True' } })

Now, if you do this :

query = { 'status' : "ok",
          'articles.title' : { '$exists' : 'True' },
          'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
    print item

And that it doesn't show anything, the query is correct, but you don't have the right database, or the right tree, or whatever.

But I assure you, that with the database you showed me, that if you do...

query = { 'status' : "ok",
          'articles.title' : { '$exists' : 'True' },
          'articles.description' : { '$exists' : 'True' } }
for item in db.ncollect.find(query):
    save.write(item[0]['title'])
    save.write(item[0]['description'])

It'll do what you wished to do in the first place.

Now, the key item[0] might not be good, but for this, I can't really be of any help since it is was you are showing on the screen. :)


Okay, now. I have found something for you that is a bit more complicated, but is cool :) But I'm not sure if it'll work for you. I suspect you're giving us a wrong tree, since when you do .find( {'status' : 'ok'} ), it doesn't return anything, and it should return all the documents with a 'status' : 'ok', and since you have lots...

Anyways, here is the query, that you should use with .aggregate() method, instead of .find() :

elem = { '$match' : { 'status' : 'ok', 'articles.title' : { '$exists' : 'True'}, 'articles.description' : { '$exists' : 'True'}} }
[ elem, { '$unwind' : '$articles' }, elem ]

If you want an explanation as to how this works, I invite you to read this page.

This query will return ONLY the elements in your array that have a title, and a description, with a status OK. If an element doesn't have a title, or a description, it will be ignored.

IMCoins
  • 2,861
  • 1
  • 8
  • 21
  • It says syntax error in for loop.. Like are you sure the '{' and '}' stuff are proper? I'm using IDLE btw. And the True part needs to be enclosed in '' right? Like 'True' .. ? – IncyWincyRz Mar 01 '18 at 17:15
  • @IncyWincyRz I corrected the syntax, I obviously couldn't check my code since I can't reproduce the database, sorry. I'm writting all this code without the possibility to run it. – IMCoins Mar 01 '18 at 19:20
  • I just had a revelation kind of a thing. In the database, articles is an array of objects. So we're trying to access the title and description of object 0 of an article. While fetching the database, i set the page_size to 1 and thus i fetched only one object. With this in mind, is there any modification that can be done? The code returns an empty output. But the syntax is correct and there are no errors.! – IncyWincyRz Mar 02 '18 at 03:43
  • I edited the answer for a screenshot of the database that contains multiple objects also. @IMCoins – IncyWincyRz Mar 02 '18 at 03:49
  • @IncyWincyRz See second part of my answer. It is a complicated query I have found and adapted to your DataBase. I invite you to upvote the other thread as well as this one. :) – IMCoins Mar 02 '18 at 08:29