0

Methods to return structures from python functions have been discussed at length in various posts. Two good ones here and here.

However, unless I have missed it, none of the proposed solutions define the structure in the same place where its members are set, and instead either repeat the list of members on assignment (not DRY) or rely on position (error prone).

I am looking for a DRY way to do this both for writing speed and to avoid argument misalignment errors common when you repeat yourself.

The below code snippet shows three attempts to do this. For brevity's sake, the example's structure contains only one element, but the intention is obviously that the structures contain multiple elements.

The three methods are DRY, embedding the structure definition with the initialization of the returned instance.

Method 1 highlights the need for a better way but illustrates the DRY sought after syntax, where the structure and how it should be populated (decided at run time) are in the same place, namely the dict() call.

Method 2 uses typing.NamedTuple and seems to work. However it uses mutable defaults to do so

Method 3 follows method 2's approach, using dataclasses.dataclass rather than typing.NamedTuple. It fails because the former explicitly prohibits mutable defaults, raising ValueError: mutable default is not allowed

from collections import namedtuple
from dataclasses import dataclass
from typing import NamedTuple, List, Tuple

# Method 1
def ret_dict(foo_: float, bar_: float) -> Tuple:
    return_ = dict(foo_bar=[foo_, bar_])
    _ = namedtuple('_', return_.keys())
    return _(*return_.values())


# Method 2
def ret_nt(foo_: float, bar_: float) -> 'ReturnType':
    class ReturnType(NamedTuple):
        foo_bar: List[float] = [foo_, bar_]     # Mutable default value allowed
    return ReturnType()


# Method 3
def ret_dc(foo_: float, bar_: float) -> 'ReturnType':
    @dataclass
    class ReturnType:
        foo_bar: List[float] = [foo_, bar_]   # raises ValueError: mutable default is not allowed
    return ReturnType()


def main():
    rt1 = ret_dict(1, 0)
    rt1.foo_bar.append(3)
    rt2 = ret_dict(2, 0)
    print(rt1)
    print(rt2)

    rt1 = ret_nt(1, 0)
    rt1.foo_bar.append(3)   # amending the mutable default does not affect subsequent calls
    rt2 = ret_nt(2, 0)
    print(rt1)
    print(rt2)

    rt1 = ret_dc(1, 0)
    rt1.foo_bar.append(3)  # amending the default does not affect subsequent calls
    rt2 = ret_dc(2, 0)
    print(rt1)
    print(rt2)


if __name__ == "__main__":
    main()

The following questions arise:

Is method 2 a sensible pythonic approach?

One concern is that mutable defaults are somewhat of a taboo, certainly for function arguments. I wonder if their use here is OK however, given that the attached code suggests that these NamedTuple defaults (and perhaps the entire ReturnType definition) are evaluated on every function call, contrary to function argument defaults which it seems to me are evaluated only once and persist forever (hence the problem).

A further concern is that the dataclasses module seems to have gone out of its way to explicitly prohibit this usage. Was that decision overly dogmatic in this instance? or is warding against method 2 warranted?

Is this inefficient?

I would be happy if the syntax of Method 2 meant:

1 - Define ReturnType once on the first pass only

2 - call __init__() with the given (dynamically set) initialization on every pass

However, I am afraid that it may instead mean the following:

1 - Define ReturnType and its defaults on every pass

2 - call __init__() with the given (dynamically set) initialization on every pass

Should one be concerned about the inefficiency of re-defining chunky ReturnTypes on every pass when the call is in a "tight" loop? Isn't this inefficiency present whenever a class is defined inside a function? Should classes be defined inside functions?

Is there a (hopefully good) way to achieve DRY definition-instantiation using the new dataclasses module (python 3.7)?

Finally, is there a better DRY definition-instantiation syntax?

OldSchool
  • 385
  • 1
  • 2
  • 11
  • 1
    Yes, it's inefficient. You are creating an entirely new type every time you call one of the functions. – chepner Jul 08 '20 at 19:50
  • @chepner Thanks. Do you have a sense of the additional overhead vs the commonly seen creation of a dict or SimpleNamespace with all the key value pairs? – OldSchool Jul 09 '20 at 11:24
  • @OldSchool: A quick test says creating a new namedtuple type every time is about 220 times slower and takes about 45 times more space than using a single namedtuple type. See https://ideone.com/TVGklL and https://ideone.com/zqjqXk. (The output order is a bit weird due to buffering.) – user2357112 supports Monica Jul 09 '20 at 16:24

1 Answers1

1

However, I am afraid that it may instead mean the following:

1 - Define ReturnType and its defaults on every pass

2 - call __init__() with the given (dynamically set) initialization on every pass

That's what it means, and it costs a lot of time and space. Also, it makes your annotations invalid - the -> 'ReturnType' requires a ReturnType definition at module level. It also breaks pickling.

Stick with a module-level ReturnType and don't use mutable defaults. Or, if all you want is member access by dot notation and you don't really care about making a meaningful type, just use types.SimpleNamespace:

return types.SimpleNamespace(thing=whatever, other_thing=stuff)
user2357112 supports Monica
  • 215,440
  • 22
  • 321
  • 400
  • Thanks for the SimpleNamespace suggestion.,I take it from your answer then that you don't believe there is a good DRY way of doing this if you want to make ReturnType benefit from NamedTuple or dataclass functionality? On another note. I was under the impression that a function def inside another function def was only processed on the first pass. Is that correct? If so, why are class definitions processed on every call? Is there a purpose for that or is it just an implementation quirk? – OldSchool Jul 09 '20 at 11:19
  • @OldSchool: A nested function definition is also reexecuted every time the outer function is called. Function definitions and class statements are both imperative in Python. – user2357112 supports Monica Jul 09 '20 at 11:34
  • Forgive my ignorance. Because of the %autoreload magic used interactively to ensure re-parsing of object definitions when debugging, I have been making the uneducated assumption that the interpreter parses and stores object definitions the first time it encounters them, then the next time has them up its sleeve for re-use. That;s how I rationalize the use of %autoreload to change that behavior. I guess you are saying that the interpreter does not do that and instead recreates everything as it finds it? It recreates the nested classes above. Have I correctly understood? what's %autoreload for? – OldSchool Jul 09 '20 at 16:14
  • @OldSchool: Most code doesn't redefine every class it uses every time it wants to create an instance. Your code reexecutes the class statements because the class statement is inside the function. Most code doesn't put class statements inside function definitions. – user2357112 supports Monica Jul 09 '20 at 16:30
  • Interesting. I do see a lot of code with function definitions inside encapsulating functions though (as opposed to class def inside encapsulating function) Are these nested function definitions parsed every time the encapsulating function is called? – OldSchool Jul 09 '20 at 21:08
  • @OldSchool: They are reexecuted, producing a new function object. They are not re-parsed. – user2357112 supports Monica Jul 09 '20 at 21:10
  • (This is much less expensive than reexecuting a class definition, but not free.) – user2357112 supports Monica Jul 09 '20 at 21:13
  • I have marked this as the answer as it seems rather plausible. Buyer beware I have not verified its accuracy – OldSchool Jul 28 '20 at 23:46