13

So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?

It seems that I would be able to use cPickle to store the dictionary and load it, but I'm not sure where to take it from there.

  • When you read the pickle documentation, what questions did you have? Can you post some code to show what you have working and what you need help with? – S.Lott Aug 04 '09 at 18:13
  • Basically, I would want to use the dictionary as a database type thing. So I could write the dictionary to a file, and then load the file in my script when I wanted to add something to the dictionary, but using regular dictionary methods. Is there a way I can just load the file, and then modify the dictionary with the typical dict["key"] = "items" or del dict["key"]? I've tried to do this now, and python tells me that dict is undefined in this particular example. –  Aug 05 '09 at 12:53

8 Answers8

18

If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.

Alex Martelli
  • 762,786
  • 156
  • 1,160
  • 1,345
  • 1
    Il piacere é tutto mio, Stefano!-) – Alex Martelli Aug 05 '09 at 01:30
  • It's worth noting that this does not work for edits made to elements contained in the dictionary. So if you have a shelve called `data` with a list of items then `data['items'].append(123)` will fail. – AnnanFay Apr 24 '19 at 21:49
10

Use JSON

Similar to Pete's answer, I like using JSON because it maps very well to python data structures and is very readable:

Persisting data is trivial:

>>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)

and loading it is about the same:

>>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]

In addition, JSON loading doesn't suffer from the same security issues that shelve and pickle do, although IIRC it is slower than pickle.

If you want to write on every operation:

If you want to save on every operation, you can subclass the Python dict object:

import os
import json

class DictPersistJSON(dict):
    def __init__(self, filename, *args, **kwargs):
        self.filename = filename
        self._load();
        self.update(*args, **kwargs)

    def _load(self):
        if os.path.isfile(self.filename) 
           and os.path.getsize(self.filename) > 0:
            with open(self.filename, 'r') as fh:
                self.update(json.load(fh))

    def _dump(self):
        with open(self.filename, 'w') as fh:
            json.dump(self, fh)

    def __getitem__(self, key):
        return dict.__getitem__(self, key)

    def __setitem__(self, key, val):
        dict.__setitem__(self, key, val)
        self._dump()

    def __repr__(self):
        dictrepr = dict.__repr__(self)
        return '%s(%s)' % (type(self).__name__, dictrepr)

    def update(self, *args, **kwargs):
        for k, v in dict(*args, **kwargs).items():
            self[k] = v
        self._dump()

Which you can use like this:

db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write

Which is woefully inefficient, but can get you off the ground quickly.

brice
  • 21,825
  • 7
  • 73
  • 94
  • I've read it is better to subclass `collections.abc.MutableMapping` instead of `dict`. The current implementation fails if someone calls `db.setdefault('baz', 123)`. This article explains why: http://www.kr41.net/2016/03-23-dont_inherit_python_builtin_dict_type.html – AnnanFay Apr 24 '19 at 22:09
  • Found what I was looking for. Have a usecase where performance not relevant. Thanks – Ranjith Ramachandra Jan 20 '21 at 18:57
6

Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you're asking for here.

Amber
  • 446,318
  • 77
  • 595
  • 531
1

If using only strings as keys (as allowed by the shelve module) is not enough, the FileDict might be a good way to solve this problem.

Community
  • 1
  • 1
Michael Mauderer
  • 3,361
  • 1
  • 21
  • 48
1

Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:

1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You'll need to pay at least some attention to what is going on.

2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.

It has 1 advantage: Absurdly easy to do.

Brian
  • 24,434
  • 16
  • 74
  • 162
1

My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:

Create a YAML file, "employment.yml":

new jersey:
  mercer county:
    pumbers: 3
    programmers: 81
  middlesex county:
    salesmen: 62
    programmers: 81
new york:
  queens county:
    plumbers: 9
    salesmen: 36

Step 3: Read it in Python

import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()

and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.

Community
  • 1
  • 1
Pete
  • 9,450
  • 7
  • 50
  • 57
0

pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it's large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).

if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.

Stefano Borini
  • 125,999
  • 87
  • 277
  • 404
0

Have you considered using dbm?

import dbm
import pandas as pd
import numpy as np
db = b=dbm.open('mydbm.db','n')

#create some data
df1 = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(101,200, size=(10, 3)), columns=list('EFG'))

#serialize the data and put in the the db dictionary
db['df1']=df1.to_json()
db['df2']=df2.to_json()


# in some other process:
db=dbm.open('mydbm.db','r')
df1a = pd.read_json(db['df1'])
df2a = pd.read_json(db['df2'])

This tends to work even without a db.close()

Mohrez
  • 1
  • 1
  • Please [avoid using rhetorical questions](https://meta.stackoverflow.com/questions/300987/should-we-avoid-rhetorical-questions-in-answers) in answers. This can make it seem like you are asking a question in an answer. – GalaxyCat105 Dec 30 '20 at 21:58