-1

I am trying to import a JSON file for use in a Python editor so that I can perform analysis on the data. I am quite new to Python so not sure how I am meant to achieve this. My JSON file is full of tweet data, example shown here:

{"id":441999105775382528,"score":0.0,"text":"blablabla","user_id":1441694053,"created":"Fri Mar 07 18:09:33 GMT 2014","retweet_id":0,"source":"<a href=\"http://twitterfeed.com\" rel=\"nofollow\">twitterfeed</a>","geo_long":null,"geo_lat":null,"location":"","screen_name":"SevenPS4","name":"Playstation News","lang":"en","timezone":"Amsterdam","user_created":"2013-05-19","followers":463,"hashtags":"","mentions":"","following":1062,"urls":"http://bit.ly/1lcbBW6","media_urls":"","favourites_count":4514,"reply_status_id":0,"reply_user_id":0,"is_truncated":false,"is_retweet":false,"original_text":null,"status_count":4514,"description":"Tweeting the latest Playstation news!","url":null,"utc_offset":3600}

My questions:

How do I import the JSON file so that I can perform analysis on it in a Python editor?

How do I perform analysis on only a set number of the data (IE 100/200 of them instead of all of them)?

Is there a way to get rid of some of the fields such as score, user_id, created, etc without having to go through all of my data manually to do so?

Some of the tweets have invalid/unusable symbols within them, is there anyway to get rid of those without having to go through manually?

P̲̳x͓L̳
  • 3,457
  • 3
  • 25
  • 36
user1745447
  • 37
  • 2
  • 9
  • How have you got this json file full of tweets? If you've used python twitter client like `tweepy`, you could have limited the fields, the amount of tweets you want to get from twitter. – alecxe Mar 31 '14 at 16:35
  • Consider asking (and searching for) separate questions. For example, you might remove the 'invalid/unusable symbols' in your tweets as described here: http://stackoverflow.com/a/15321222/1586229 – bgschiller Mar 31 '14 at 16:35

1 Answers1

1

I'd use Pandas for this job, as you are will not only load the json, but perform some data analysis tasks on it. Depending on the size of your json-file, this one should do it:

import pandas as pd
import json

# read a sample json-file (replace the link with your file location
j = json.loads("yourfilename")
# you might select the relevant keys before constructing the data-frame
df = pd.DataFrame.from_dict([{k:v} for k,v in j.iteritems() if k in ["id","retweet_count"]])
# select a subset (the first five rows)
df.iloc[:5]
# do some analysis
df.retweet_count.sum()
>>> 200
dorvak
  • 7,323
  • 4
  • 29
  • 42