0

I am using the 'json' packages to read JSON files and convert to CSV. I wrote a script some months ago using Python 2.7, which extracts a dictionary containing the name of the objects in JSON file (It worked perfectly back then). When I run the script in Python 3.3 the order in which the objects are retrieved are different every time the script is executed.

Any idea why this happens? and How to fix it?

My script:

import json
import csv

   input_file = open('my_path\\json_file', 'r')
   myjson = json.load(input_f)
   input_f.close()
   new_json = myjson['markers'] #main object containing sub-objects

   keys = {} #empty dictionary to store list of sub-objects

   for i in new_json:
       for k in i.keys():
           keys[k] = 1

Some output examples:

EXECUTION 1:

KEYS{'': 1, 'latitude': 1, 'Particles': 1, 'Wind Speed': 1, 'image': 1, 'Humidity': 1, 'C/ Daoiz y Velarde': 1, 'Noise': 1, 'Battery level': 1, 'id': 1, 'Soil Moisture': 1, ....}

EXECUTION 2:

KEYS{'': 1, 'Relative humidity': 1, 'N02': 1, 'Particles': 1, 'Rainfall': 1, 'image': 1, 'Odometer': 1, 'Co Index': 1, 'Wind Direction': 1, 'Atmospheric Pressure': 1, ....}
MattDMo
  • 90,104
  • 20
  • 213
  • 210
Manu
  • 3
  • 2
  • Python's `dict` objects are unordered. The top answer in the question I linked to above will solve this by loading the json directly into a [`collections.OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict). – dano Aug 15 '14 at 23:55
  • OT: Can I ask you why you are exporting data from JSON to CSV? In other words what is the end goal you want to achieve by this transformation? – Adrian Kalbarczyk Aug 16 '14 at 02:11
  • My end goal is to prepare data harvested from a website into PostgreSQL. Ordering and cleaning (remove some known inconsistencies) is required. – Manu Aug 16 '14 at 06:39

2 Answers2

0

This occurs because python dictionaries are not guaranteed to be sorted. Use an ordereddict to fix it:

import json
import csv
from collections import OrderedDict

input_file = open('my_path\\json_file, 'r')
myjson = OrderedDict(json.load(input_f))
input_f.close()
keys = {} #empty dictionary to store list of sub-objects

for i in new_json:
    for k in i.keys():
       keys[k] = 1
hd1
  • 30,506
  • 4
  • 69
  • 81
  • 1
    I don't think this will preserve the ordering that's in the file stored on disk, because it's still initially being loaded into a normal `dict`. – dano Aug 15 '14 at 23:54
  • OP wants it formatted on output if he wants it on the disk, they need only write it out that way, using json.dumps or whatever. – hd1 Aug 16 '14 at 00:04
0

This is the way dictionaries now work in Python 3. It is the result of a security patch that was disabled by default in 2.x. See this answer for more explanation.

You can get the behavior you desire by using the object_pairs_hook keyword argument. Pass it the collections.OrderedDict class. You'll likely also want to store the results in an OrderedDict, too. This is documented here and here. For example:

import json
import csv
import collections

input_file = open('my_path\\json_file', 'r')
myjson = json.load(input_f, object_pairs_hook=collections.OrderedDict)
input_f.close()
new_json = myjson['markers'] #main object containing sub-objects

keys = collections.OrderedDict()

for i in new_json:
    for k in i.keys():
        keys[k] = 1
Community
  • 1
  • 1
Jesse L
  • 868
  • 8
  • 11