1

I've got multiple file to load as JSON, they are all formatted the same way but for one of them I can't load it without raising an exception. This is where you can find the file:

File

I did the following code:

def from_seed_data_extract_summoners():
   summonerIds = set()
   for i in range(1,11):
       file_name = 'data/matches%s.json' % i
       print file_name
       with open(file_name) as data_file:    
           data = json.load(data_file)
       for match in data['matches']:
           for summoner in match['participantIdentities']:
               summonerIds.add(summoner['player']['summonerId'])
   return summonerIds

The error occurs when I do the following: json.load(data_file). I suppose there is a special character but I can't find it and don't know how to replace it. The error generated is:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xeb in position 6: invalid continuation byte

Do you know how I can get ride of it?

mel
  • 2,430
  • 2
  • 27
  • 60

4 Answers4

2
  1. replace file_name = 'data/matches%s.json' % i with file_name = 'data/matches%i.json' % i
  2. the right syntax is data = json.load(file_name) and not -

    with open(file_name) as data_file: data = json.load(data_file)

EDIT:

def from_seed_data_extract_summoners():
 summonerIds = set()   
   for i in range(1,11):
        file_name = 'data/matches%i.json' % i
        with open(file_path) as f:
            data = json.load(f, encoding='utf-8')
        for match in data['matches']:
            for summoner in match['participantIdentities']:
                summonerIds.add(summoner['player']['summonerId'])    
    return summonerIds
Yonatan Kiron
  • 2,148
  • 16
  • 24
2

Your JSON is trying to force the data into unicode, not just a simple string. You've got some embedded character (probably a space or something not very noticable) that is not able to be forced into unicode.

How to get string objects instead of Unicode ones from JSON in Python?

That is a great thread about making JSON objects more manageable in python.

Community
  • 1
  • 1
rob
  • 1,657
  • 1
  • 18
  • 34
1

try :

json.loads(data_file.read(), encoding='utf-8')
Yonatan Kiron
  • 2,148
  • 16
  • 24
Loïc
  • 10,366
  • 1
  • 26
  • 42
  • I got the following: 'ascii' codec can't decode byte 0xc3 in position 16260798: ordinal not in range(128) – mel Jun 27 '16 at 13:38
1

Try:

json.loads(unicode(data_file.read(), errors='ignore'))

or :

json.loads(unidecode.unidecode(unicode(data_file.read(), errors='ignore')))

(for the second, you would need to install unidecode)

ntg
  • 9,002
  • 6
  • 48
  • 73