I am trying to unzip some .json.gz
files, but gzip
adds some characters to it, and hence makes it unreadable for JSON.
What do you think is the problem, and how can I solve it?
If I use unzipping software such as 7zip to unzip the file, this problem disappears.
This is my code:
with gzip.open('filename' , 'rb') as f:
json_content = json.loads(f.read())
This is the error I get:
Exception has occurred: json.decoder.JSONDecodeError
Extra data: line 2 column 1 (char 1585)
I used this code:
with gzip.open ('filename', mode='rb') as f:
print(f.read())
and realized that the file starts with b'
(as shown below):
b'{"id":"tag:search.twitter.com,2005:5667817","objectType":"activity"
I think b'
is what makes the file unworkable for the next stage. Do you have any solution to remove the b'
? There are millions of this zipped file, and I cannot manually do that.
I uploaded a sample of these files in the following link just a few json.gz files