284

In Python you can have multiple iterators in a list comprehension, like

[(x,y) for x in a for y in b]

for some suitable sequences a and b. I'm aware of the nested loop semantics of Python's list comprehensions.

My question is: Can one iterator in the comprehension refer to the other? In other words: Could I have something like this:

[x for x in a for a in b]

where the current value of the outer loop is the iterator of the inner?

As an example, if I have a nested list:

a=[[1,2],[3,4]]

what would the list comprehension expression be to achieve this result:

[1,2,3,4]

?? (Please only list comprehension answers, since this is what I want to find out).

ThomasH
  • 19,270
  • 9
  • 53
  • 57

11 Answers11

262

I hope this helps someone else since a,b,x,y don't have much meaning to me! Suppose you have a text full of sentences and you want an array of words.

# Without list comprehension
list_of_words = []
for sentence in text:
    for word in sentence:
       list_of_words.append(word)
return list_of_words

I like to think of list comprehension as stretching code horizontally.

Try breaking it up into:

# List Comprehension 
[word for sentence in text for word in sentence]

Example:

>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> [word for sentence in text for word in sentence]
['Hi', 'Steve!', "What's", 'up?']

This also works for generators

>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> gen = (word for sentence in text for word in sentence)
>>> for word in gen: print(word)
Hi
Steve!
What's
up?
Skam
  • 5,216
  • 3
  • 20
  • 29
  • 19
    "There are only two hard problems in Computer Science: cache invalidation and naming things." -- Phil Karlton – cezar Aug 08 '18 at 08:42
  • 1
    This is a great answer as it makes the whole problem less abstract! Thank you! – A. Blesius Mar 16 '20 at 13:13
  • I was wondering, can you do the same with three abstraction levels in a list comprehension? Like chapters in text, sentences in chapters and words in sentences? – Alma Alma Mar 24 '20 at 06:55
  • Not only did this answer help me understand what was happening, but it literally applied to my exact use case: counting the number of words in a list of sentences. – Justin S Jul 05 '20 at 22:37
  • Alternatively, you can: [[word for word in sentence] for sentence in text] – Saskia Jul 29 '20 at 10:02
  • 1
    @Saskia Not quite. This will just give you same input back. Do you see why? – Skam Jul 29 '20 at 16:59
  • I was just suggesting a syntax to loop over the two comprehensions without this backward logic. Not related to solving the original question. – Saskia Jul 29 '20 at 17:36
188

To answer your question with your own suggestion:

>>> [x for b in a for x in b] # Works fine

While you asked for list comprehension answers, let me also point out the excellent itertools.chain():

>>> from itertools import chain
>>> list(chain.from_iterable(a))
>>> list(chain(*a)) # If you're using python < 2.6
Cide
  • 3,997
  • 2
  • 16
  • 16
  • 28
    `[x for b in a for x in b]` This has always bugged be about python. This syntax is so backwards. The general form of `x for x in y` always has the variable directly after the for, feeds to the expression to the left of the for. As soon as you do a double comprehension, your most recently iterated variable is suddenly so "far". It's awkward, and doesn't read naturally at all – Cruncher Feb 18 '20 at 19:41
141

Gee, I guess I found the anwser: I was not taking care enough about which loop is inner and which is outer. The list comprehension should be like:

[x for b in a for x in b]

to get the desired result, and yes, one current value can be the iterator for the next loop.

Aran-Fey
  • 30,995
  • 8
  • 80
  • 121
ThomasH
  • 19,270
  • 9
  • 53
  • 57
55

Order of iterators may seem counter-intuitive.

Take for example: [str(x) for i in range(3) for x in foo(i)]

Let's decompose it:

def foo(i):
    return i, i + 0.5

[str(x)
    for i in range(3)
        for x in foo(i)
]

# is same as
for i in range(3):
    for x in foo(i):
        yield str(x)
Dima Tisnek
  • 9,367
  • 4
  • 48
  • 106
  • 4
    What an eye-opener !! – nehem Jun 28 '17 at 03:44
  • My understanding is that the reason for this is that "the first iteration listed is the topmost iteration that would be typed if the comprehension were written as nested for loops". The reason this is counterintuitive is that the OUTER loop (topmost if written as nested for-loops) appears at the INSIDE of the bracketed list/dict (comprehension'ed object). Conversely, the INNER loop (innermost when written as nested for-loops) is precisely the rightmost loop in a comprehension, and is in that way appears at the OUTSIDE of the comprehension. – Zach Siegel Aug 06 '17 at 06:31
  • Abstractly written we have `[(output in loop 2) (loop 1) (loop 2)]` with `(loop 1) = for i in range(3)` and `(loop 2) = for x in foo(i):` and `(output in loop 2) = str(x)`. – Qaswed Jul 10 '19 at 09:58
24

ThomasH has already added a good answer, but I want to show what happens:

>>> a = [[1, 2], [3, 4]]
>>> [x for x in b for b in a]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined

>>> [x for b in a for x in b]
[1, 2, 3, 4]
>>> [x for x in b for b in a]
[3, 3, 4, 4]

I guess Python parses the list comprehension from left to right. This means, the first for loop that occurs will be executed first.

The second "problem" of this is that b gets "leaked" out of the list comprehension. After the first successful list comprehension b == [3, 4].

Martin Thoma
  • 91,837
  • 114
  • 489
  • 768
  • 4
    Interesting point. I was surprised at this: `x = 'hello';` `[x for x in xrange(1,5)];` `print x # x is now 4` – grinch Nov 18 '14 at 17:11
  • 5
    This leakage was fixed in Python 3: https://stackoverflow.com/questions/4198906/python-list-comprehension-rebind-names-even-after-scope-of-comprehension-is-thi – Denilson Sá Maia Oct 06 '15 at 13:34
19

This memory technic helps me a lot:

[ <RETURNED_VALUE> <OUTER_LOOP1> <INNER_LOOP2> <INNER_LOOP3> ... <OPTIONAL_IF> ]

And now you can think about Return + Outer-loop as the only Right Order

Knowing above, the order in list comprehensive even for 3 loops seem easy:


c=[111, 222, 333]
b=[11, 22, 33]
a=[1, 2, 3]

print(
  [
    (i, j, k)                            # <RETURNED_VALUE> 
    for i in a for j in b for k in c     # in order: loop1, loop2, loop3
    if i < 2 and j < 20 and k < 200      # <OPTIONAL_IF>
  ]
)
[(1, 11, 111)]

because the above is just a:

for i in a:                         # outer loop1 GOES SECOND
  for j in b:                       # inner loop2 GOES THIRD
    for k in c:                     # inner loop3 GOES FOURTH
      if i < 2 and j < 20 and k < 200:
        print((i, j, k))            # returned value GOES FIRST

for iterating one nested list/structure, technic is the same: for a from the question:

a = [[1,2],[3,4]]
[i2    for i1 in a      for i2 in i1]
which return [1, 2, 3, 4]

for one another nested level

a = [[[1, 2], [3, 4]], [[5, 6], [7, 8, 9]], [[10]]]
[i3    for i1 in a      for i2 in i1     for i3 in i2]
which return [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

and so on

Sławomir Lenart
  • 5,087
  • 1
  • 33
  • 47
  • 1
    Thanks, but what you describe is actually the simple case where the involved iterators are independent. In fact, in your example you could use the iterators _in any order_ and would get the same result list (modulo ordering). The case I was more interested in was with nested lists where one iterator becomes the iterable of the next. – ThomasH Jan 05 '20 at 15:15
  • @ThomasH: the order of loop defined in bold is exactly for your need. On the bottom added an example to cover your data and one more example with extra nested level. – Sławomir Lenart Jan 05 '20 at 18:01
12

If you want to keep the multi dimensional array, one should nest the array brackets. see example below where one is added to every element.

>>> a = [[1, 2], [3, 4]]

>>> [[col +1 for col in row] for row in a]
[[2, 3], [4, 5]]

>>> [col +1 for row in a for col in row]
[2, 3, 4, 5]
steven
  • 121
  • 1
  • 3
5

I feel this is easier to understand

[row[i] for row in a for i in range(len(a))]

result: [1, 2, 3, 4]
Opal
  • 70,085
  • 21
  • 151
  • 167
Miao Li
  • 51
  • 1
  • 1
3

Additionally, you could use just the same variable for the member of the input list which is currently accessed and for the element inside this member. However, this might even make it more (list) incomprehensible.

input = [[1, 2], [3, 4]]
[x for x in input for x in x]

First for x in input is evaluated, leading to one member list of the input, then, Python walks through the second part for x in x during which the x-value is overwritten by the current element it is accessing, then the first x defines what we want to return.

simP
  • 233
  • 2
  • 9
2

I could never write double list comprehension on my first attempt. Reading into PEP202, it turns out the reason is that it was implemented in the opposite way you would read it in English. The good news is that it is a logically sound implementation, so once you understand the structure, it's very easy to get right.

Let a, b, c, d be successively nested objects. For me, the intuitive way to extend list comprehension would mimic English:

# works
[f(b) for b in a]
# does not work
[f(c) for c in b for b in a]
[f(c) for c in g(b) for b in a]
[f(d) for d in c for c in b for b in a]

In other words, you'd be reading from the bottom up, i.e.

# wrong logic
(((d for d in c) for c in b) for b in a)

However this is not how Python implements nested lists. Instead, the implementation treats the first chunk as completely separate, and then chains the fors and ins in a single chunk from the top down (instead of bottom up), i.e.

# right logic
d: (for b in a, for c in b, for d in c)

Note that the deepest nested level (for d in c) is farthest from the final object in the list (d). The reason for this comes from Guido himself:

The form [... for x... for y...] nests, with the last index varying fastest, just like nested for loops.

Using Skam's text example, this becomes even more clear:

# word: for sentence in text, for word in sentence
[word for sentence in text for word in sentence]

# letter: for sentence in text, for word in sentence, for letter in word
[letter for sentence in text for word in sentence for letter in word]

# letter:
#     for sentence in text if len(sentence) > 2, 
#     for word in sentence[0], 
#     for letter in word if letter.isvowel()
[letter for sentence in text if len(sentence) > 2 for word in sentence[0] for letter in word if letter.isvowel()]
Martim
  • 547
  • 4
  • 7
1

This flatten_nlevel function calls recursively the nested list1 to covert to one level. Try this out

def flatten_nlevel(list1, flat_list):
    for sublist in list1:
        if isinstance(sublist, type(list)):        
            flatten_nlevel(sublist, flat_list)
        else:
            flat_list.append(sublist)

list1 = [1,[1,[2,3,[4,6]],4],5]

items = []
flatten_nlevel(list1,items)
print(items)

output:

[1, 1, 2, 3, 4, 6, 4, 5]
ravibeli
  • 352
  • 4
  • 20
  • 1
    Ok, the question was particularly about list comprehension, and list flattening was only an example. But I assume, your generalized list flattener would need to call itself recursively. So it's probably more like `flatten_nlevel(sublist, flat_list)`, right?! – ThomasH May 07 '20 at 07:57