27

Consider the following list comprehension

[ (x,f(x)) for x in iterable if f(x) ]

This filters the iterable based a condition f and returns the pairs of x,f(x). The problem with this approach is f(x) is calculated twice. It would be great if we could write like

[ (x,fx) for x in iterable if fx where fx = f(x) ]
or
[ (x,fx) for x in iterable if fx with f(x) as fx ]

But in python we have to write using nested comprehensions to avoid duplicate call to f(x) and it makes the comprehension look less clear

[ (x,fx) for x,fx in ( (y,f(y) for y in iterable ) if fx ]

Is there any other way to make it more pythonic and readable?


Update

Coming soon in python 3.8! PEP

# Share a subexpression between a comprehension filter clause and its output
filtered_data = [y for x in data if (y := f(x)) is not None]
balki
  • 22,482
  • 26
  • 85
  • 135
  • 1
    Are you *sure* it is calculated twice when you have compiled it? – Viktor Mellgren Jul 23 '12 at 07:42
  • 2
    Not sure how to compile. But on the python prompt, it is executed twice. I checked by adding a print statement. – balki Jul 23 '12 at 07:47
  • If you don't want to compute `f(x)` twice, try to add a cache in `f()`. – Cédric Julien Jul 23 '12 at 07:51
  • @Vixen: yes, python will call `f(x)` twice for every `x in iterable` in the first statement. – Martijn Pieters Jul 23 '12 at 07:51
  • @CédricJulien Yes. it is also one of the possible ways. But I am trying to find a cleaner way. Adding a cache will have a dictionary look-up and might be a overkill for simple lambdas. – balki Jul 23 '12 at 07:55
  • 1
    this is not so much a "`[... where ...]` clause", so much as wanting to optimize the `[... if ...]` clause and/or to introduce `let`-style anonymous bindings. – ninjagecko Jul 23 '12 at 08:07
  • @ninjagecko yes, `let` is exactly the term. – Karl Knechtel Jul 23 '12 at 14:24
  • @balki Please move your "update" section from inside the question to a new answer. See [Should you put an answer to your own question and mark it as the accepted answer or update your question?](https://meta.stackoverflow.com/a/271399) and [What is the appropriate action when the answer to a question is added to the question itself?](https://meta.stackoverflow.com/q/267434) for an explanation. – Zero Piraeus Mar 17 '19 at 11:25

4 Answers4

12

There is no where statement but you can "emulate" it using for:

a=[0]
def f(x):
    a[0] += 1
    return 2*x

print [ (x, y) for x in range(5) for y in [f(x)] if y != 2 ]
print "The function was executed %s times" % a[0]

Execution:

$ python 2.py 
[(0, 0), (2, 4), (3, 6), (4, 8)]
The function was executed 5 times

As you can see, the functions is executed 5 times, not 10 or 9.

This for construction:

for y in [f(x)]

imitate where clause.

Igor Chubin
  • 51,940
  • 8
  • 108
  • 128
11

You seek to have let-statement semantics in python list comprehensions, whose scope is available to both the ___ for..in(map) and the if ___(filter) part of the comprehension, and whose scope depends on the ..for ___ in....


Your solution, modified: Your (as you admit unreadable) solution of [ (x,fx) for x,fx in ( (y,f(y) for y in iterable ) if fx ] is the most straightforward way to write the optimization.

Main idea: lift x into the tuple (x,f(x)).

Some would argue the most "pythonic" way to do things would be the original [(x,f(x)) for x in iterable if f(x)] and accept the inefficiencies.

You can however factor out the ((y,fy) for y in iterable) into a function, if you plan to do this a lot. This is bad because if you ever wish to have access to more variables than x,fx (e.g. x,fx,ffx), then you will need to rewrite all your list comprehensions. Therefore this isn't a great solution unless you know for sure you only need x,fx and plan to reuse this pattern.


Generator expression:

Main idea: use a more complicated alternative to generator expressions: one where python will let you write multiple lines.

You could just use a generator expression, which python plays nicely with:

def xfx(iterable):
    for x in iterable:
        fx = f(x)
        if fx:
            yield (x,fx)

xfx(exampleIterable)

This is how I would personally do it.


Memoization/caching:

Main idea: You could also use(abuse?) side-effects and make f have a global memoization cache, so you don't repeat operations.

This can have a bit of overhead, and requires a policy of how large the cache should be and when it should be garbage-collected. Thus this should only be used if you'd have other uses for memoizing f, or if f is very expensive. But it would let you write...

[ (x,f(x)) for x in iterable if f(x) ]

...like you originally wanted without the performance hit of doing the expensive operations in f twice, even if you technically call it twice. You can add a @memoized decorator to f: example (without maximum cache size). This will work as long as x is hashable (e.g. a number, a tuple, a frozenset, etc.).


Dummy values:

Main idea: capture fx=f(x) in a closure and modify the behavior of the list comprehension.

filterTrue(
    (lambda fx=f(x): (x,fx) if fx else None)() for x in iterable
)

where filterTrue(iterable) is filter(None, iterable). You would have to modify this if your list type (a 2-tuple) was actually capable of being None.

batbrat
  • 4,951
  • 3
  • 30
  • 37
ninjagecko
  • 77,349
  • 22
  • 129
  • 137
5

Nothing says you must use comprehensions. In fact most style guides I've seen request that you limit them to simple constructs, anyway.

You could use a generator expression, instead.

def fun(iterable):
    for x in iterable:
        y = f(x)
        if y:
            yield x, y


print list(fun(iterable))
jamylak
  • 111,593
  • 23
  • 218
  • 220
Keith
  • 37,985
  • 10
  • 48
  • 67
3

Map and Zip ?

fnRes = map(f, iterable)
[(x,fx) for x,fx in zip(iterable, fnRes) if fx)]
Vinayak Kolagi
  • 1,617
  • 1
  • 12
  • 22
  • 1
    `y==1` is extremely poor form; `y` is already a boolean, you could just say `if y` (rather than comparing True==1 / False==1; it's sort of like saying `return myBoolean==bool(1)`, which is worse than `return myBoolean==True`, rather than the usual `return myBoolean`). `y` could also be named something semantically meaningful, such as `fx`. Other than that, this is a reasonable answer. [edit: +1 =)] – ninjagecko Jul 23 '12 at 08:09
  • 1
    Will not work if the iterable is a generator. Will have to use `itertools.tee` to get two iterators. – balki Jul 23 '12 at 08:26
  • @balki Good point, also this could be inefficient in some cases. – jamylak Jul 23 '12 at 08:37