0

I would like to do two fields count data in rethinkdb python. Example: My tables are.. Channel table

{
"channel_name":  "channel01" ,
"id":  "58115c5b-af1f-4b1a-b572-20611ba34ee5" ,
"userid":  "b9936a2f-6cea-41ef-a7f8-1812c3192112" ,
"visibility":  "Public"
}

{
"channel_name":  "channel02" ,
"id":  "1852fac4-3056-46d9-9dfa-2cd969872daa" ,
"userid":  "b9936a2f-6cea-41ef-a7f8-1812c3192112" ,
"visibility":  "Public"
}

{
"channel_name":  "channel03" ,
"id":  "4304fa69-173b-4284-9bf8-d9078bb018f8" ,
"userid":  "ef5e924e-9cab-44a1-a381-16b35d0d7578" ,
"visibility":  "Public"
}

Events table

{
"content":  "Welcome to channel01" ,
"id":  "cc9bb13e-07e1-4ec9-b1ec-4546b8f8feda" ,
"parent":  "channel01" ,
"type":  "message" ,
"user": {
      "user_id":  "b9936a2f-6cea-41ef-a7f8-1812c3192112" ,
               "user_name":  "wymh" 
    } 
}

{
"content":  "hello" ,
"id":  "288847a9-ffb5-4531-bdc3-ddb3effbd993" ,
"parent":  "channel02" ,
"type":  "message" ,
"user": {
      "user_id":  "b9936a2f-6cea-41ef-a7f8-1812c3192112" ,
      "user_name":  "wymh" 
    } 
} 

{
"content":  "hi" ,
"id":  "3232f906-d9f1-43af-a594-5da82376f296" ,
"parent":  "channel02" ,
"type":  "message" ,
"user": {
      "user_id":  "ef5e924e-9cab-44a1-a381-16b35d0d7578" ,
      "user_name":  "waiyan" 
    } 
} 

{
"content":  "Nice to meet you" ,
"id":  "77b7afb3-af98-47a0-84bf-9ac8bd1fe14c" ,
"parent":  "channel02" ,
"type":  "message" ,
"user": {
      "user_id":  "b9936a2f-6cea-41ef-a7f8-1812c3192112" ,
      "user_name":  "wymh" 
    } 
} 

{
"content":  "hi" ,
"id":  "31068384-4734-4733-a7f7-677edd21e557" ,
"parent":  "channel03" ,
"type":  "message" ,
"user": {
      "user_id":  "ef5e924e-9cab-44a1-a381-16b35d0d7578" ,
      "user_name":  "waiyan" 
    } 
}

{
"content":  "hi" ,
"id":  "0b535878-3cac-4619-9c48-f0985a7ee587" ,
"parent":  "channel03" ,
"type":  "message" ,
"user": {
      "user_id":  "7e485c95-f0b6-45f7-8af2-d93bde0ea8a3" ,
      "user_name":  "user01" 
    } 
}

My Query is:

channel_count =(yield r.table("channel").filter(r.row['visibility'] != 'private').inner_join(
              r.table("events"),
              lambda channel, events:
              (channel["channel_name"] == events["parent"])
          ).zip().group('channel_name').count().run())

My query output data is:

  {u'channel01': 1, u'channel02': 3, u'channel03': 2}

I want to below result:

{'channel01': 1, 'userid': 1}
{'channel02': 3, 'userid': 2}
{'channel03': 2, 'userid': 2}

Please help me how to write the python Rethinkdb query?

1 Answers1

1

If you are looking for the number of unique user ids per channel with the number of events, I would do the following (I spaced it to make it more readable IMO)...

>>> events_per_channel = (
        r
        .table("channel")
        .filter(r.row['visibility'] != 'private')
        .inner_join(
            r.table("events"),
            lambda channel,
            events: (channel["channel_name"] == events["parent"])
        )
        .zip()
        .map(
            {
                'channel_name': r.row['channel_name'],
                'user_id': r.row['user']['user_id']
            }
        )
        .group('channel_name')
        .count()
        .ungroup()
        .map(
            lambda x:
            {
                'channel_name': x['group'],
                'events': x['reduction'],
                'uniq_user_ids': (
                    r
                    .table('events')
                    .filter(
                        {
                            'parent': x['group']
                        }
                    )
                    .coerce_to('array')
                    .map(lambda y: y['user']['user_id'])
                    .distinct()
                    .count()
               )
            }
    )
    .run()
)
>>> print json.dumps(events_per_channel, indent=4)
[
    {
        "channel_name": "channel01", 
        "events": 1, 
        "uniq_user_ids": 1
    }, 
    {
        "channel_name": "channel02", 
        "events": 3, 
        "uniq_user_ids": 2
    }, 
    {
        "channel_name": "channel03", 
        "events": 2, 
        "uniq_user_ids": 2
    }
]

Essentially what I did, was use a nested query for the number of unique users per channel as well as use ungroup, so I can get the original data from group using the reduction keyword. The output is not exactly how you asked for it, since what you asked for was to use the name of the channel as the key and the value as the number of events. What I did was created a key name for each value. Let me know if this is not what you really meant.

linuxdynasty
  • 366
  • 2
  • 5