1342

I have a JSON file that is a mess that I want to prettyprint. What's the easiest way to do this in Python?

I know PrettyPrint takes an "object", which I think can be a file, but I don't know how to pass a file in. Just using the filename doesn't work.

codeforester
  • 28,846
  • 11
  • 78
  • 104
Colleen
  • 18,089
  • 12
  • 42
  • 70
  • 9
    Try to parse the JSON using `json.loads()` and pretty print that resulting dictionary. Or just skip to the **Pretty printing** section of the Python [documentation for `json`](http://docs.python.org/library/json.html). – Blender Oct 17 '12 at 21:40
  • 12
    http://stackoverflow.com/questions/352098/how-to-pretty-print-json-script – ed. Oct 17 '12 at 21:42
  • 1
    @Blender if you post an answer I'll give you credit... this might get closed as a duplicate, because the solution is the same, but the question is different, so perhaps not. – Colleen Oct 17 '12 at 21:50
  • 19
    why not ` – jfs Oct 17 '12 at 21:56
  • 13
    I don't think it's duplicate because pretty-printing from command line is not the same as pretty-printing programmatically from Python. Voting to reopen. – vitaut Sep 16 '15 at 15:31
  • Here is a blog for that. https://codeblogmoney.com/json-pretty-print-using-python/ – James Malvi Jul 04 '18 at 02:59
  • I use: `json.dump(args_data, argsfile, indent=4)` – Charlie Parker Oct 16 '20 at 20:09
  • what's wrong with `import pprint pprint.pprint(json)`? – Charlie Parker Feb 09 '21 at 22:52

14 Answers14

2080

The json module already implements some basic pretty printing with the indent parameter that specifies how many spaces to indent by:

>>> import json
>>>
>>> your_json = '["foo", {"bar":["baz", null, 1.0, 2]}]'
>>> parsed = json.loads(your_json)
>>> print(json.dumps(parsed, indent=4, sort_keys=True))
[
    "foo", 
    {
        "bar": [
            "baz", 
            null, 
            1.0, 
            2
        ]
    }
]

To parse a file, use json.load():

with open('filename.txt', 'r') as handle:
    parsed = json.load(handle)
codeforester
  • 28,846
  • 11
  • 78
  • 104
Blender
  • 257,973
  • 46
  • 399
  • 459
  • 180
    For simple pretty-printing this also works without explicit parsing: ``print json.dumps(your_json_string, indent=4)`` – Peterino Aug 04 '14 at 14:07
  • 13
    Without the indent, you just get a single line of ugly text, which is why I came here. – krs013 Mar 16 '16 at 18:46
  • 3
    This is similar to JavaScript `var str = JSON.stringify(obj, null, 4);` as discussed here http://stackoverflow.com/questions/4810841/how-can-i-pretty-print-json-using-javascript – Christophe Roussy May 31 '16 at 13:17
  • Test your result using [JSON Pretty Print](https://jsonformatter.org/json-pretty-print) – James Malvi Jan 21 '18 at 12:28
  • 2
    If you are serving this thru a web server route, like a Flask or Django, you need to wrap the packet with `
    ` and `
    ` otherwise the whitespace will get stripped when rendered in the browser.
    – phyatt Jun 05 '18 at 21:24
  • 10
    @Peterino I had to parse json string first: `print(json.dumps(json.loads(your_json_string), indent=2))` otherwise it just showed me an escaped string – vladkras Feb 15 '19 at 14:36
  • can we include colors like `jq . ` does? – alper Jul 31 '20 at 21:25
  • what about `separators=(',', ': ')`? – Charlie Parker Oct 16 '20 at 20:05
  • @vladkras oddly enough, I only got it to work with my data, which is a collection of objects, *without* parsing it; otherwise, it would complain about the JSON object needing to be a str, bytes or bytearray, not a list. The example data given by Blender on his answer looks to me like a collection, but oh well – Pere Jan 09 '21 at 02:17
  • what's wrong with `import pprint pprint.pprint(json)`? – Charlie Parker Feb 09 '21 at 22:52
  • Blender's solution does not work if you have pytorch tensors see error: `TypeError: Object of type Tensor is not JSON serializable ` – Charlie Parker Feb 09 '21 at 22:59
  • @Peterino why doesn't this work: `print(json.dumps(str(dict), indent=indent, sort_keys=True))`? None of these worked for me: `def pprint_dict(dict, indent=2): import json # print(json.dumps(str(dict), indent=indent, sort_keys=True)) print(json.dumps(json.loads(dict), indent=indent, sort_keys=True)) print(json.dumps(json.loads(str(dict)), indent=indent, sort_keys=True))` – Charlie Parker Feb 09 '21 at 23:01
  • Just a note, if you put your data in an `OrderedDict` instead of a regular `dict`, you can control the order in which the entries show up in the json file, which helps readability. You have to remove the `sort_keys` argument of course. If file size is a concern, you may wish to consider using tabs instead of spaces by using `indent='\t'` which will save 75% of indentation bytes (compared to an indentation depth of 4 spaces) – Cerno May 04 '21 at 08:52
371

You can do this on the command line:

python3 -m json.tool some.json

(as already mentioned in the commentaries to the question, thanks to @Kai Petzke for the python3 suggestion).

Actually python is not my favourite tool as far as json processing on the command line is concerned. For simple pretty printing is ok, but if you want to manipulate the json it can become overcomplicated. You'd soon need to write a separate script-file, you could end up with maps whose keys are u"some-key" (python unicode), which makes selecting fields more difficult and doesn't really go in the direction of pretty-printing.

You can also use jq:

jq . some.json

and you get colors as a bonus (and way easier extendability).

Addendum: There is some confusion in the comments about using jq to process large JSON files on the one hand, and having a very large jq program on the other. For pretty-printing a file consisting of a single large JSON entity, the practical limitation is RAM. For pretty-printing a 2GB file consisting of a single array of real-world data, the "maximum resident set size" required for pretty-printing was 5GB (whether using jq 1.5 or 1.6). Note also that jq can be used from within python after pip install jq.

Eric O Lebigot
  • 81,422
  • 40
  • 198
  • 249
Gismo Ranas
  • 4,815
  • 3
  • 22
  • 32
  • 4
    JQ is great but there is a max limit so its useless for large files. (i.e. blows up handling a 1.15mb file) https://github.com/stedolan/jq/issues/1041 – Chris McKee May 17 '16 at 08:35
  • 4
    yeah, man, definitely, if you are writing jq filters with more than 10K lines of code I think you're trying something like going to mars with a bicycle. – Gismo Ranas May 17 '16 at 08:39
  • 2
    lol :D @gismo-ranas The json.tool version piped to a file works really really well on large files; and is stupidly fast. I like JQ but formatting anything beyond a small payload (which you could do in most text editors) is beyond its reach :) Random addition: http://www.json-generator.com/ is a neat tool to make test data – Chris McKee May 17 '16 at 08:46
  • 6
    or just: `jq '' < some.json` – fatal_error Dec 09 '16 at 19:21
  • I don't think that Python's json lib will output the `u"some-key"` with the `u` – kbuilds Aug 31 '17 at 21:56
  • Plus with `curl -s` option you can hide sometime useless speed statistic. – ipeacocks Oct 19 '17 at 20:27
  • 2
    Actually I strongly recommend using `python3 -m json.tool OUT`, as this keeps the original order of the fields in JSON dicts. The python interpreter version 2 sorts the fields in alphabetically ascending order, which often is not, what you want. – Kai Petzke Jan 20 '19 at 17:00
  • 1
    It's worth noting that `python -m json.tool` _does_ work, though, and is better than nothing. I know it's 2019 already, but there are still plenty of systems around that don't have `python3` installed! – Todd Owen Feb 13 '19 at 06:38
  • 1
    Also, note that there is no need for the shell file redirections: `python3 -m json.tool in_file [out_file]` works directly (I updated the answer). – Eric O Lebigot Apr 25 '20 at 09:31
  • 1
    Unfortunately, python3 kills my non-ASCII characters, while jq handles them fine. Too bad python3 still cannot handle UTF-8 out of the box. – Holger Jakobs Nov 26 '20 at 15:50
79

You could use the built-in module pprint (https://docs.python.org/3.9/library/pprint.html).

How you can read the file with json data and print it out.

import json
import pprint

json_data = None
with open('file_name.txt', 'r') as f:
    data = f.read()
    json_data = json.loads(data)

print(json_data)
{"firstName": "John", "lastName": "Smith", "isAlive": true, "age": 27, "address": {"streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100"}, 'children': []}

pprint.pprint(json_data)
{'address': {'city': 'New York',
             'postalCode': '10021-3100',
             'state': 'NY',
             'streetAddress': '21 2nd Street'},
 'age': 27,
 'children': [],
 'firstName': 'John',
 'isAlive': True,
 'lastName': 'Smith'}
ikreb
  • 1,210
  • 8
  • 20
  • 8
    Problem with this is that pprint will use single and double quotes interchangably, but json requires double quotes only, so your pprinted json may no longer parse as valid json. – drevicko Jun 29 '18 at 14:38
  • 6
    Yes, but it's only to output a json file. Not to take the output and write it again in a file. – ikreb Jul 09 '18 at 14:01
54

Pygmentize + Python json.tool = Pretty Print with Syntax Highlighting

Pygmentize is a killer tool. See this.

I combine python json.tool with pygmentize

echo '{"foo": "bar"}' | python -m json.tool | pygmentize -l json

See the link above for pygmentize installation instruction.

A demo of this is in the image below:

demo

Shubham Chaudhary
  • 36,933
  • 9
  • 67
  • 78
  • 3
    In your example `-g` is not actually working ;) Since input comes from stdin, pygmentize is not able to make a good guess. You need to specify lexer explicitly: `echo '{"foo": "bar"}' | python -m json.tool | pygmentize -l json` – Denis The Menace Jan 29 '18 at 13:00
  • 1
    @DenisTheMenace It used to work in 2015 when I created this example image. It doesn't seem to be working now on my system as well. – Shubham Chaudhary Jan 30 '18 at 09:19
38

Use this function and don't sweat having to remember if your JSON is a str or dict again - just look at the pretty print:

import json

def pp_json(json_thing, sort=True, indents=4):
    if type(json_thing) is str:
        print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
    else:
        print(json.dumps(json_thing, sort_keys=sort, indent=indents))
    return None

pp_json(your_json_string_or_dict)
zelusp
  • 2,773
  • 2
  • 26
  • 52
17

Use pprint: https://docs.python.org/3.6/library/pprint.html

import pprint
pprint.pprint(json)

print() compared to pprint.pprint()

print(json)
{'feed': {'title': 'W3Schools Home Page', 'title_detail': {'type': 'text/plain', 'language': None, 'base': '', 'value': 'W3Schools Home Page'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.w3schools.com'}], 'link': 'https://www.w3schools.com', 'subtitle': 'Free web building tutorials', 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': '', 'value': 'Free web building tutorials'}}, 'entries': [], 'bozo': 0, 'encoding': 'utf-8', 'version': 'rss20', 'namespaces': {}}

pprint.pprint(json)
{'bozo': 0,
 'encoding': 'utf-8',
 'entries': [],
 'feed': {'link': 'https://www.w3schools.com',
          'links': [{'href': 'https://www.w3schools.com',
                     'rel': 'alternate',
                     'type': 'text/html'}],
          'subtitle': 'Free web building tutorials',
          'subtitle_detail': {'base': '',
                              'language': None,
                              'type': 'text/html',
                              'value': 'Free web building tutorials'},
          'title': 'W3Schools Home Page',
          'title_detail': {'base': '',
                           'language': None,
                           'type': 'text/plain',
                           'value': 'W3Schools Home Page'}},
 'namespaces': {},
 'version': 'rss20'}
Oliver
  • 7,780
  • 2
  • 28
  • 35
Nakamoto
  • 753
  • 7
  • 15
  • 5
    `pprint` does not produce a valid JSON document. – selurvedu Nov 26 '19 at 11:46
  • @selurvedu what does that mean and why does that matter? – Charlie Parker Feb 09 '21 at 22:50
  • 1
    @CharlieParker I expect they meant that knowing you have a valid JSON document is pretty useful. Sure, you can use the `json` module to work with the data and dictionary keys work the same with double- or single-quoted strings, but some tools, e.g. [Postman](https://getpostman.com) and [JSON Editor Online](https://jsoneditoronline.org/), both expect keys and values to be double-quoted (as per the JSON spec). In any case, [json.org](https://www.json.org/) specifies the use of double quotes, which `pprint` doesn't produce. E.g. `pprint.pprint({"name": "Jane"})` produces `{'name': 'Jane'}`. – digitalformula Mar 07 '21 at 07:13
14

To be able to pretty print from the command line and be able to have control over the indentation etc. you can set up an alias similar to this:

alias jsonpp="python -c 'import sys, json; print json.dumps(json.load(sys.stdin), sort_keys=True, indent=2)'"

And then use the alias in one of these ways:

cat myfile.json | jsonpp
jsonpp < myfile.json
V P
  • 141
  • 1
  • 4
7

Here's a simple example of pretty printing JSON to the console in a nice way in Python, without requiring the JSON to be on your computer as a local file:

import pprint
import json 
from urllib.request import urlopen # (Only used to get this example)

# Getting a JSON example for this example 
r = urlopen("https://mdn.github.io/fetch-examples/fetch-json/products.json")
text = r.read() 

# To print it
pprint.pprint(json.loads(text))
David Liu
  • 309
  • 3
  • 4
  • 1
    I get the following error message in Python 3: "TypeError: the JSON object must be str, not 'bytes'" – Mr. T Jan 23 '18 at 08:41
6
def saveJson(date,fileToSave):
    with open(fileToSave, 'w+') as fileToSave:
        json.dump(date, fileToSave, ensure_ascii=True, indent=4, sort_keys=True)

It works to display or save it to a file.

AJP
  • 21,889
  • 17
  • 76
  • 108
4

You could try pprintjson.


Installation

$ pip3 install pprintjson

Usage

Pretty print JSON from a file using the pprintjson CLI.

$ pprintjson "./path/to/file.json"

Pretty print JSON from a stdin using the pprintjson CLI.

$ echo '{ "a": 1, "b": "string", "c": true }' | pprintjson

Pretty print JSON from a string using the pprintjson CLI.

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }'

Pretty print JSON from a string with an indent of 1.

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -i 1

Pretty print JSON from a string and save output to a file output.json.

$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -o ./output.json

Output

enter image description here

Travis Clarke
  • 4,115
  • 3
  • 23
  • 33
2

I think that's better to parse the json before, to avoid errors:

def format_response(response):
    try:
        parsed = json.loads(response.text)
    except JSONDecodeError:
        return response.text
    return json.dumps(parsed, ensure_ascii=True, indent=4)
p3quod
  • 649
  • 8
  • 11
0

I had a similar requirement to dump the contents of json file for logging, something quick and easy:

print(json.dumps(json.load(open(os.path.join('<myPath>', '<myjson>'), "r")), indent = 4 ))

if you use it often then put it in a function:

def pp_json_file(path, file):
    print(json.dumps(json.load(open(os.path.join(path, file), "r")), indent = 4))
user 923227
  • 1,963
  • 3
  • 20
  • 39
-1

Hopefully this helps someone else.

In the case when there is a error that something is not json serializable the answers above will not work. If you only want to save it so that is human readable then you need to recursively call string on all the non dictionary elements of your dictionary. If you want to load it later then save it as a pickle file then load it (e.g. torch.save(obj, f) works fine).

This is what worked for me:

#%%

def _to_json_dict_with_strings(dictionary):
    """
    Convert dict to dict with leafs only being strings. So it recursively makes keys to strings
    if they are not dictionaries.

    Use case:
        - saving dictionary of tensors (convert the tensors to strins!)
        - saving arguments from script (e.g. argparse) for it to be pretty

    e.g.

    """
    if type(dictionary) != dict:
        return str(dictionary)
    d = {k: _to_json_dict_with_strings(v) for k, v in dictionary.items()}
    return d

def to_json(dic):
    import types
    import argparse

    if type(dic) is dict:
        dic = dict(dic)
    else:
        dic = dic.__dict__
    return _to_json_dict_with_strings(dic)

def save_to_json_pretty(dic, path, mode='w', indent=4, sort_keys=True):
    import json

    with open(path, mode) as f:
        json.dump(to_json(dic), f, indent=indent, sort_keys=sort_keys)

def my_pprint(dic):
    """

    @param dic:
    @return:

    Note: this is not the same as pprint.
    """
    import json

    # make all keys strings recursively with their naitve str function
    dic = to_json(dic)
    # pretty print
    pretty_dic = json.dumps(dic, indent=4, sort_keys=True)
    print(pretty_dic)
    # print(json.dumps(dic, indent=4, sort_keys=True))
    # return pretty_dic

import torch
# import json  # results in non serializabe errors for torch.Tensors
from pprint import pprint

dic = {'x': torch.randn(1, 3), 'rec': {'y': torch.randn(1, 3)}}

my_pprint(dic)
pprint(dic)

output:

{
    "rec": {
        "y": "tensor([[-0.3137,  0.3138,  1.2894]])"
    },
    "x": "tensor([[-1.5909,  0.0516, -1.5445]])"
}
{'rec': {'y': tensor([[-0.3137,  0.3138,  1.2894]])},
 'x': tensor([[-1.5909,  0.0516, -1.5445]])}

I don't know why returning the string then printing it doesn't work but it seems you have to put the dumps directly in the print statement. Note pprint as it has been suggested already works too. Note not all objects can be converted to a dict with dict(dic) which is why some of my code has checks on this condition.

Context:

I wanted to save pytorch strings but I kept getting the error:

TypeError: tensor is not JSON serializable

so I coded the above. Note that yes, in pytorch you use torch.save but pickle files aren't readable. Check this related post: https://discuss.pytorch.org/t/typeerror-tensor-is-not-json-serializable/36065/3


PPrint also has indent arguments but I didn't like how it looks:

    pprint(stats, indent=4, sort_dicts=True)

output:

{   'cca': {   'all': {'avg': tensor(0.5132), 'std': tensor(0.1532)},
               'avg': tensor([0.5993, 0.5571, 0.4910, 0.4053]),
               'rep': {'avg': tensor(0.5491), 'std': tensor(0.0743)},
               'std': tensor([0.0316, 0.0368, 0.0910, 0.2490])},
    'cka': {   'all': {'avg': tensor(0.7885), 'std': tensor(0.3449)},
               'avg': tensor([1.0000, 0.9840, 0.9442, 0.2260]),
               'rep': {'avg': tensor(0.9761), 'std': tensor(0.0468)},
               'std': tensor([5.9043e-07, 2.9688e-02, 6.3634e-02, 2.1686e-01])},
    'cosine': {   'all': {'avg': tensor(0.5931), 'std': tensor(0.7158)},
                  'avg': tensor([ 0.9825,  0.9001,  0.7909, -0.3012]),
                  'rep': {'avg': tensor(0.8912), 'std': tensor(0.1571)},
                  'std': tensor([0.0371, 0.1232, 0.1976, 0.9536])},
    'nes': {   'all': {'avg': tensor(0.6771), 'std': tensor(0.2891)},
               'avg': tensor([0.9326, 0.8038, 0.6852, 0.2867]),
               'rep': {'avg': tensor(0.8072), 'std': tensor(0.1596)},
               'std': tensor([0.0695, 0.1266, 0.1578, 0.2339])},
    'nes_output': {   'all': {'avg': None, 'std': None},
                      'avg': tensor(0.2975),
                      'rep': {'avg': None, 'std': None},
                      'std': tensor(0.0945)},
    'query_loss': {   'all': {'avg': None, 'std': None},
                      'avg': tensor(12.3746),
                      'rep': {'avg': None, 'std': None},
                      'std': tensor(13.7910)}}

compare to:

{
    "cca": {
        "all": {
            "avg": "tensor(0.5144)",
            "std": "tensor(0.1553)"
        },
        "avg": "tensor([0.6023, 0.5612, 0.4874, 0.4066])",
        "rep": {
            "avg": "tensor(0.5503)",
            "std": "tensor(0.0796)"
        },
        "std": "tensor([0.0285, 0.0367, 0.1004, 0.2493])"
    },
    "cka": {
        "all": {
            "avg": "tensor(0.7888)",
            "std": "tensor(0.3444)"
        },
        "avg": "tensor([1.0000, 0.9840, 0.9439, 0.2271])",
        "rep": {
            "avg": "tensor(0.9760)",
            "std": "tensor(0.0468)"
        },
        "std": "tensor([5.7627e-07, 2.9689e-02, 6.3541e-02, 2.1684e-01])"
    },
    "cosine": {
        "all": {
            "avg": "tensor(0.5945)",
            "std": "tensor(0.7146)"
        },
        "avg": "tensor([ 0.9825,  0.9001,  0.7907, -0.2953])",
        "rep": {
            "avg": "tensor(0.8911)",
            "std": "tensor(0.1571)"
        },
        "std": "tensor([0.0371, 0.1231, 0.1975, 0.9554])"
    },
    "nes": {
        "all": {
            "avg": "tensor(0.6773)",
            "std": "tensor(0.2886)"
        },
        "avg": "tensor([0.9326, 0.8037, 0.6849, 0.2881])",
        "rep": {
            "avg": "tensor(0.8070)",
            "std": "tensor(0.1595)"
        },
        "std": "tensor([0.0695, 0.1265, 0.1576, 0.2341])"
    },
    "nes_output": {
        "all": {
            "avg": "None",
            "std": "None"
        },
        "avg": "tensor(0.2976)",
        "rep": {
            "avg": "None",
            "std": "None"
        },
        "std": "tensor(0.0945)"
    },
    "query_loss": {
        "all": {
            "avg": "None",
            "std": "None"
        },
        "avg": "tensor(12.3616)",
        "rep": {
            "avg": "None",
            "std": "None"
        },
        "std": "tensor(13.7976)"
    }
}
Charlie Parker
  • 13,522
  • 35
  • 118
  • 206
-9

It's far from perfect, but it does the job.

data = data.replace(',"',',\n"')

you can improve it, add indenting and so on, but if you just want to be able to read a cleaner json, this is the way to go.