Importing JSON file for Python analysis

Question

I am trying to import a JSON file for use in a Python editor so that I can perform analysis on the data. I am quite new to Python so not sure how I am meant to achieve this. My JSON file is full of tweet data, example shown here:

{"id":441999105775382528,"score":0.0,"text":"blablabla","user_id":1441694053,"created":"Fri Mar 07 18:09:33 GMT 2014","retweet_id":0,"source":"<a href=\"http://twitterfeed.com\" rel=\"nofollow\">twitterfeed</a>","geo_long":null,"geo_lat":null,"location":"","screen_name":"SevenPS4","name":"Playstation News","lang":"en","timezone":"Amsterdam","user_created":"2013-05-19","followers":463,"hashtags":"","mentions":"","following":1062,"urls":"http://bit.ly/1lcbBW6","media_urls":"","favourites_count":4514,"reply_status_id":0,"reply_user_id":0,"is_truncated":false,"is_retweet":false,"original_text":null,"status_count":4514,"description":"Tweeting the latest Playstation news!","url":null,"utc_offset":3600}

My questions:

How do I import the JSON file so that I can perform analysis on it in a Python editor?

How do I perform analysis on only a set number of the data (IE 100/200 of them instead of all of them)?

Is there a way to get rid of some of the fields such as score, user_id, created, etc without having to go through all of my data manually to do so?

Some of the tweets have invalid/unusable symbols within them, is there anyway to get rid of those without having to go through manually?

How have you got this json file full of tweets? If you've used python twitter client like `tweepy`, you could have limited the fields, the amount of tweets you want to get from twitter. — alecxe, Mar 31 '14 at 16:35
Consider asking (and searching for) separate questions. For example, you might remove the 'invalid/unusable symbols' in your tweets as described here: http://stackoverflow.com/a/15321222/1586229 — bgschiller, Mar 31 '14 at 16:35

score 1 · Answer 1 · answered Mar 31 '14 at 17:20

1

I'd use Pandas for this job, as you are will not only load the json, but perform some data analysis tasks on it. Depending on the size of your json-file, this one should do it:

import pandas as pd
import json

# read a sample json-file (replace the link with your file location
j = json.loads("yourfilename")
# you might select the relevant keys before constructing the data-frame
df = pd.DataFrame.from_dict([{k:v} for k,v in j.iteritems() if k in ["id","retweet_count"]])
# select a subset (the first five rows)
df.iloc[:5]
# do some analysis
df.retweet_count.sum()
>>> 200

answered Mar 31 '14 at 17:20

dorvak

7,323
4
29
42

so with the second line of that code, can I just enter in the fields such as score, user_id, etc that I want included and it will only use those? – user1745447 Mar 31 '14 at 17:39
1

Yes, exactly..(as Long as The fields aren't nested) – dorvak Mar 31 '14 at 19:09
is there a way to select a subset from the JSON data without using a dataframe? – user1745447 Apr 02 '14 at 18:41

Importing JSON file for Python analysis

1 Answers1