Can't extract JSON from an http request

Question

I'm having problems getting data from an HTTP response. The format unfortunately comes back with '\n' attached to all the key/value pairs. JSON says it must be a str and not "bytes".

I have tried a number of fixes so my list of includes might look weird/redundant. Any suggestions would be appreciated.

#!/usr/bin/env python3

import urllib.request
from urllib.request import urlopen
import json
import requests

url = "http://finance.google.com/finance/info?client=ig&q=NASDAQ,AAPL"
response = urlopen(url)
content = response.read()
print(content)

data = json.loads(content)
info = data[0]
print(info)
#got this far - planning to extract "id:" "22144"

You include requests [use](http://stackoverflow.com/questions/16877422/parsing-json-responses) it! — Paul Rooney, Jan 06 '17 at 01:28

Muntaser Ahmed · Answer 1 · 2017-01-06T01:44:38.130

3

When it comes to making requests in Python, I personally like to use the requests library. I find it easier to use.

import json
import requests

r = requests.get('http://finance.google.com/finance/info?client=ig&q=NASDAQ,AAPL')
json_obj = json.loads(r.text[4:])

print(json_obj[0].get('id'))

The above solution prints: 22144

The response data had a couple unnecessary characters at the head, which is why I am only loading the relevant (json) portion of the response: r.text[4:]. This is the reason why you couldn't load it as json initially.

edited Jan 06 '17 at 01:44

answered Jan 06 '17 at 01:28

Muntaser Ahmed

3,797
1
13
16

yep, the forward slashes were messing up the json decoding. – deweyredman Jan 06 '17 at 01:29
Using r.json() will throw an error because the response has forward slashes at the beginning (invalid json format) , as @deweyredman mentioned. – Muntaser Ahmed Jan 06 '17 at 01:35

score 1 · Answer 2 · answered Jan 06 '17 at 01:38

Bytes object has method decode() which converts bytes to string. Checking the response in the browser, seems there are some extra characters at the beginning of the string that needs to be removed (a line feed character, followed by two slashes: '\n//'). To skip the first three characters from the string returned by the decode() method we add [3:] after the method call.

data = json.loads(content.decode()[3:])
print(data[0]['id'])

The output is exactly what you expect:

score -1 · Answer 3 · answered Jan 06 '17 at 01:30

-1

JSON says it must be a str and not "bytes".

Your content is "bytes", and you can do this as below.

data = json.loads(content.decode())

answered Jan 06 '17 at 01:30

lxyscls

243
2
15

Did you try this? – calico_ Jan 06 '17 at 01:32
Sorry, Google is unreachable in China. I mean, a general solution. – lxyscls Jan 06 '17 at 01:35
Ohh, well it's good that you explained the error from the question. But, this will still throw an error because of the extra characters that are sent with the response. – calico_ Jan 06 '17 at 01:39

Can't extract JSON from an http request

3 Answers3