1

I'm writting a little module that must be able to persist anything to disk.

I don't know ahead of time what kind of data will be in a variable so, I need default functionality that can serialize ANYTHING to disk.

I suppose pickle is the best thing to use in this case because it can serialize python objects to disk and everything is an object in python.

def to_binary(self, folder: str):
    import os
    import pickle
    os.makedirs(folder, mode=0o777, exist_ok=True)
    with open(os.path.join(folder, self.name), mode='wb') as f:
        pickle.dump(self.data, f)                      
        
def from_binary(self, folder: str): 
    os.makedirs(folder, mode=0o777, exist_ok=True)
    with open(os.path.join(folder, self.name), mode='rb') as f:
        self.data = pickle.load(f)

Frankly, though, I'm not sure what it's limitations are. seems like I could use this to write a dictionary to disk, a string, anything vanilla python. But what about other objects? like a pandas DataFrame? (of course that has builtin ways to save to disk, but still, would my stuff choke on a DataFrame object just as an example?)

Do you know of anything that would break this? If so, is there an even more general solution for saving data to disk?


Side note, another reason I might want to use this is for 'hashing' purposes. If I get an object that isn't hashable, theoretically I could use pickle to obtain a character stream that is hashable. As long that its serialization process is deterministic that should be sufficient for my purposes. If you see any problem with this ancillary concern, please let me know.

These kinds of things get into much deeper computer science concepts than I truly understand, so I really appreciate your help!

Legit Stack
  • 1,998
  • 3
  • 24
  • 43
  • 1
    Sure, serializing *anything* is what pickles are supposed to be able to do, but this sounds like a recipe for bad design and the hashing thing sounds even worse (pickling isn’t required to be deterministic). What is it that requires serializing absolutely anything to disk? What if what’s serialized is process- or process-state-specific? – Ry- Oct 04 '20 at 21:59
  • 1
    "Serialize ANYTHING" doesn't actually make sense - not every object represents something that makes sense to serialize. There is no 100% general serialize-absolutely-any-object mechanism. – user2357112 supports Monica Oct 04 '20 at 22:03
  • @Ry- I want as generalized a solution as possible for my base class, I think it would be used as a last resort, as many things have better ways to persist their data to disk. So serializing to disk is almost always going to be very contextual in this system, but I wanted a stock solution to fall back on if a specific solution isn't available yet. thanks for comment on pickle-hashing as well. right now I have a timestamp solution, but hashing would be ideal I think, I just can't find a solution to be able to hash anything, and I guess pickle isn't it either. thanks! – Legit Stack Oct 04 '20 at 22:09

0 Answers0