1

I am seeing this behavior using shelve:

import shelve

my_shelve = shelve.open('/tmp/shelve', writeback=True)
my_shelve['a'] = {'foo': 'bar'}
my_shelve['b'] = my_shelve['a']
id(my_shelve['a'])  # 140421814419392
id(my_shelve['b'])  # 140421814419392
my_shelve['a']['foo'] = 'Hello'
my_shelve['a']['foo']  # 'Hello'
my_shelve['b']['foo']  # 'Hello'
my_shelve.close()

my_shelve = shelve.open('/tmp/shelve', writeback=True)
id(my_shelve['a'])  # 140421774309128
id(my_shelve['b'])  # 140421774307832 -> This is weird.
my_shelve['a']['foo']  # 'Hello'
my_shelve['b']['foo']  # 'Hello'
my_shelve['a']['foo'] = 'foo'
my_shelve['a']['foo']  # 'foo'
my_shelve['b']['foo']  # 'Hello'
my_shelve.close()

As you can see when the shelve gets reopened the two objects that were previously the same object are now two different objects.

  1. Anybody knows what is happening here?
  2. Anybody knows how to avoid this behavior?

I am using Python 3.7.0

3 Answers3

1

shelve stores pickled representations of objects to the shelf file. When you store the same object as my_shelf['a'] and my_shelf['b'], shelve writes a pickle of the object for the 'a' key, and another pickle of the object for the 'b' key. One key thing to note is that it pickles all values separately.

When you reopen the shelf, shelve uses the pickled representations to reconstruct the objects. It uses the pickle for 'a' to reconstruct the dict you stored, and it uses the pickle for 'b' to reconstruct the dict you stored again.

The pickles do not interact with each other and do not have any way to return the same object as each other when unpickled. There is no indication in the on-disk representation that my_shelf['a'] and my_shelf['b'] were ever the same object; a shelf produced using separate objects for my_shelf['a'] and my_shelf['b'] could look identical.


If you want to preserve the fact that those objects were identical, you shouldn't store them in separate keys of a shelf. Consider pickling and unpickling a single dict with 'a' and 'b' keys instead of using shelve.

user2357112 supports Monica
  • 215,440
  • 22
  • 321
  • 400
0

Anybody knows what is happening here?

Python variables are references to objects. When you type

a = 123

behind the scenes, Python is creating a new object int(123) and then making a point to it. If you then write

a = 456

then Python is creating a different object, int(456), and updating a to be a reference to the new object. It doesn't overwrite what's stored in a box named a in the way that a variable assignment in the C language would. Since id() returns the object's memory address (well, the CPython reference implementation does anyway), it will have a different value every time you point a at a different object.

Anybody knows how to avoid this behavior?

You can't, because it's a property of how assignment works.

Kirk Strauser
  • 27,753
  • 5
  • 45
  • 62
  • 1
    This is only partially correct, and has very little to do with the question, which is about the mechanics of `shelve`. A different persistence mechanism could easily have preserved the fact that the same object was stored to the `'a'` and `'b'` keys. – user2357112 supports Monica Nov 06 '18 at 00:30
  • Ah, I see what you're saying now. I wouldn't say "easily", though, unless part of the process is building a map of objects ids so multiple keys referencing the same object before do the same after. That sounds dreadfully expensive if the persisted objects implement the descriptor protocol. – Kirk Strauser Nov 06 '18 at 00:50
0

There is an way to do this, but it will require you to make your own class, or get clever. You can register the original ids while pickling, and set an unpickling function to look up the created object if it has been unpickled, or create it if it hasn't.

I have a quick example using the __reduce__ below. But you should probably know that this isn't the best idea in the first place.

It may be easier to use the copyreg library, but you should know that anything that you do with this library will affect anything you pickle all the time. The __reduce__ method will be cleaner and safer as you are explicitly telling pickle which classes you expect to have this behavior, instead of applying them implicitly to everything.

There are worse caveats to this system. The id will always change between python instances, so you need to store the original id during the __init__ (or __new__, however you do it) and make sure that now defunct value is maintained when it's pulled out of the shelve later. Uniqueness of id isn't even guaranteed within a python session due to garbage collection. I'm sure other reasons not to do this will come up. (I'll try to address them with my class, but I make no promises.)

import uuid

class UniquelyPickledDictionary(dict):
    _created_instances = {}

    def __init__(self, *args, _uid=None, **kwargs):
        super().__init__(*args, **kwargs)
        self.uid = _uid
        if _uid is None:
            self.uid = uuid.uuid4()
        UniquelyPickledDictionary._created_instances[self.uid] = self

    def __reduce__(self):
        return UniquelyPickledDictionary.create, (self.uid,), None, None, list(self.items())

    @staticmethod
    def create(uid):
        if uid in UniquelyPickledDictionary._created_instances:
            return UniquelyPickledDictionary._created_instances[uid]
        return UniquelyPickledDictionary(_uid=uid)

The uuid library should be more unique than the object ids in the long run. I forget what guarantees they hold, but I believe this is not multiprocessing safe.

An equivalent version using the copyreg can be made to pickle any class, but will require special handling on unpickling to guarantee repickling points to the same object. To make it the most general, a check against the "already created" dictionary would have to be made to compare against all the instances. To make it the most usable, a new value has to be added to the instance, which may not be possible if the object uses __slots__ (or in a few other cases).

I'm using 3.6, but I think it should work for any still supported version of Python. It preserved the object in my testing, with recursion (but pickle already does that) and multiple unpicklings.

Poik
  • 1,622
  • 25
  • 42