0

I would like to build a pandas random dataframe. To fulfill that purpose I need a Python function taking as arguments :

  • numpy distributions
  • their arguments

For example :

distribution 1 : normal | arguments : means = 0 , standard dev = 1 , size = 100

distribution 2 : uniform | arguments : low = 0 , high = 1 , size = 100

etc...

I do not know in advance what will be the different distributions and their arguments.

The main function will then generate random samples of the distributions using each corresponding arguments.

I have tried something like :

import numpy as np

def myfun( **kwargs ) :
    for k , v in kwargs.items() :
        print( k )
        print( v )

When I call that function with these arguments :

myfun( fun_1 = 'np.random.normal' , arg_1 = { 'loc' : 0 , 'scale' : 1 , 'size' : 7 } ,
       fun_2 = 'np.random.uniform' , arg_2 = { 'low' : 0 , 'high' : 1 , 'size' : 7 } )

The output is :

fun_1
np.random.normal
arg_1
{'loc': 0, 'scale': 1, 'size': 7}
fun_2
np.random.uniform
arg_2
{'low': 0, 'high': 1, 'size': 7}

But my purpose is not to print the desired distributions and their associated parameters but to generate a sample for each distributions.

  • Is `myfun` supposed to return, say, `np.random.normal` called with the args defined in `arg_1`? – C.Nivs Jun 17 '19 at 19:41
  • Do the functions need to be strings, or would passing them as functions (without quotes) be sufficient? Like instead of `'np.random.normal'` you'd pass `np.random.normal` – C.Nivs Jun 17 '19 at 19:52
  • Sorry for my example. Functions should be passed as functions, not strings. – Fabrice BOUCHAREL Jun 18 '19 at 03:59

2 Answers2

1

Note, the functions should be functions, not strings, for this implementation to work

If you want to return the function called with a set of kwargs, you're pretty close. I would use a positional argument for func, then you can pass kwargs into func, which is a bit more explicit:

def myfunc(func, **kwargs):
    return func(**kwargs)

Then, you could wrap each pair of func, **kwargs as tuples, and do a for loop:

# This would be called like
somelist = [(np.random.normal, { 'loc' : 0 , 'scale' : 1 , 'size' : 7 }),
            (np.random.uniform , { 'low' : 0 , 'high' : 1 , 'size' : 7 })]

results = []

# append results to a list
for func, kwargs in somelist:
    results.append(myfunc(func, **kwargs))

By doing it this way, you don't have to worry about what you name any of your variables, and it's a bit more readable. You know that the loop will be dealing with pairs of items, in this case func, kwarg pairs, and your function can handle those explicitly

Handling the string calls

So there are a few ways to accomplish this task that are a bit more tricky, but overall shouldn't be horrible. You'll need to modify myfunc to handle the function name:

# func is now a string, unlike above

def myfunc(func, **kwargs):
    # function will look like module.class.function
    # so split on '.' to get each component. The first will 
    # be the parent module in global scope, and everything else
    # is collected into a list
    mod, *f = func.split('.') # f is a list of sub-modules like ['random', 'uniform']
    # func for now will just be the module np
    func = globals().get(mod)
    for cls in f:
        # get each subsequent level down, which will overwrite func to
        # first be np.random, then np.random.uniform
        func = getattr(func, cls)
    return func(**kwargs)

The reason I'm using globals().get(mod) is a) I'm assuming you might not always be using the same module, and b) calling a renamed import from sys.modules will yield a KeyError, which isn't what you want:

import sys
import numpy as np

sys.modules['np'] # KeyError

sys.modules['numpy']
# <module 'numpy.random' from '/Users/mm92400/anaconda3/envs/new36/lib/python3.6/site-packages/numpy/random/__init__.py'>

# globals avoids the naming conflict
globals()['np']
# <module 'numpy.random' from '/Users/mm92400/anaconda3/envs/new36/lib/python3.6/site-packages/numpy/random/__init__.py'>

Then getattr(obj, attr) will return each subsequent module:

import numpy as np

getattr(np, 'random')
# <module 'numpy.random' from '/Users/mm92400/anaconda3/envs/new36/lib/python3.6/site-packages/numpy/random/__init__.py'>

# the dotted access won't work directly
getattr(np, 'random.uniform')
# AttributeError

So, in total:

import numpy as np

func, kwargs = ('np.random.normal', { 'loc' : 0 , 'scale' : 1 , 'size' : 7 })

myfunc(func, **kwargs)

array([ 0.83276777,  2.4836389 , -1.07492873, -1.20056678, -0.36409906,
       -0.76543554,  0.90191746])

And you can just extend that to the code in the first section

C.Nivs
  • 9,223
  • 1
  • 14
  • 34
  • The numpy functions can't be strings for this to work - you'd get `TypeError: 'str' object is not callable`. Otherwise this is a good answer – Green Cloak Guy Jun 17 '19 at 19:48
  • 1
    @GreenCloakGuy ah, right, I'd have to `eval` or do something weird, I'll see if I can edit it to make sense – C.Nivs Jun 17 '19 at 19:49
  • You don't have to `eval()` it, and you shouldn't because `eval()` is dangerous - but just referring to them without the quotes should work, since that invokes the reference to the function in the first place. `np.random.normal` instead of `'np.random.normal'` – Green Cloak Guy Jun 17 '19 at 19:50
  • 1
    Ideally, yes, you would avoid eval at all costs, but OP specifically asked for it as string inputs. I'll ask for a clarification – C.Nivs Jun 17 '19 at 19:50
  • That problem is probably out-of-scope for this question, but for reference the better way of handling it would probably be to parse the module/method names and [try to call the function by name](https://stackoverflow.com/questions/3061/calling-a-function-of-a-module-by-using-its-name-a-string) based on that – Green Cloak Guy Jun 17 '19 at 19:52
  • Correct, you'd have to call `np` from `sys.modules` or something similar and traverse it using `getattr` calls – C.Nivs Jun 17 '19 at 19:53
  • @GreenCloakGuy added handling the string calls in an edit, not sure if it's the best practice way, but it works for the moment – C.Nivs Jun 17 '19 at 20:09
0

You can design a function that takes other functions as inputs, and executes them. That's what the ** operator does:

def myfun(**kwargs):
    kwargs['fun_1'](**kwargs['arg_1'])  # calls the function kwargs[fun_1] with the keyword args given in kwargs[arg_1]
    kwargs['fun_2'](**kwargs['arg_2'])

You would then specify your kwargs like such:

myfun(fun_1=np.random.normal, 
      arg_1={'loc': 0, 'scale': 1, 'size': 7},
      fun_2=np.random.uniform,
      arg_2={'low': 0, 'high': 1, 'size': 7},
     )

Note how np.random.normal isn't in quotes - we refer to the actual function, by reference, but without calling it yet (because we want to do that in myfun(), not now).

I don't think there's an official name for this operator (* for lists and ** for dicts), but I call it the unpacking operator because it unpacks a data structure into function arguments.


It's usually safer to state explicit named parameters, in this situation - you'll need to come up with a pattern so that people using your function know how they're supposed to name their keywords.

Green Cloak Guy
  • 18,876
  • 3
  • 21
  • 38
  • @GrennCloakGuy In myfun you treat 2 sub functions calls but you do not know in advance the number of sub functions to call. – Fabrice BOUCHAREL Jun 18 '19 at 04:43
  • That's why I said you'd need to come up with a pattern for keywords - then you could loop through `kwargs` and take functions and arguments that correspond to each other. @C.Nivs's answer above is better, though – Green Cloak Guy Jun 18 '19 at 14:31