I have data, and each entry needs to be an instance of a class. I'm expecting to encounter many duplicate entries in my data. I essentially want to end up with a set of all the unique entries (ie discard any duplicates). However, instantiating the whole lot and putting them into a set after the fact is not optimal because...
- I have many entries,
- the proportion of duplicated entries is expected to be rather high,
- my
__init__()
method is doing quite a lot of costly computation for each unique entry, so I want to avoid redoing these computations unnecessarily.
I recognize that this is basically the same question asked here but...
the accepted answer doesn't actually solve the problem. If you make
__new__()
return an existing instance, it doesn't technically make a new instance, but it still calls__init__()
which then redoes all the work you've already done, which makes overriding__new__()
completely pointless. (This is easily demonstrated by insertingprint
statements inside__new__()
and__init__()
so you can see when they run.)the other answer requires calling a class method instead of calling the class itself when you want a new instance (eg:
x = MyClass.make_new()
instead ofx = MyClass()
). This works, but it isn't ideal IMHO since it is not the normal way one would think to make a new instance.
Can __new__()
be overridden so that it will return an existing entity without running __init__()
on it again? If this isn't possible, is there maybe another way to go about this?