2

I found this on stackoverflow, it does exacly what I was looking for :

>>> k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
>>> import itertools
>>> k.sort()
>>> list(k for k,_ in itertools.groupby(k))
[[1, 2], [3], [4], [5, 6, 2]]

I don't understand the list comprehension here. If can't manage to translate it to for iteration. I always use the following syntax :

[k for k in smthiterable (if condition)]   

I tried to change the underscore _ to something else, it still work. But if I remove it, it doesn't. What is it's use?

pwnsauce
  • 368
  • 2
  • 13

2 Answers2

4

itertools.groupby returns an iterable of pairs of items; in each pair you have the grouping key first and the items belonging to that group second. The construct for k,_ in iterable unpacks those pairs, it's a direct analogy to how an assignment statement k, _ = (0, 1) can unpack a tuple object into two names. The use of _ as a variable name here is immaterial (this is a commonly used convention in Python to indicate that the value goes unused).

The code you've presented is not a particularly convincing use of groupby, since the group object is discarded and only the unique keys were used.

The list comprehension below is more Pythonic, it avoids creating the "useless variable":

>>> [list(x) for x in sorted(set(map(tuple, k)))]
[[1, 2], [3], [4], [5, 6, 2]]
wim
  • 266,989
  • 79
  • 484
  • 630
  • I use it to discard duplicate in my list of list. You list comprehension is nice, I tried something like it, but couldn't manage to make it work. Thanks – pwnsauce May 16 '17 at 16:28
3

The underscore (_) is really like any other variable, it's not special syntax. Pretend that it's an x if it's confusing you. The underscore is typically used to denote an "unused" variable. itertools.groupby returns an iterable, where each iteration returns another iterable (each of these latter iterables happens to always have two elements). So the k, _ syntax is simply doing tuple unpacking (see here: https://chrisalbon.com/python/unpacking_a_tuple.html).

k will be the first element of each iterable (and the second is assigned to _).

Here's the condensed example from the link for convenience (and in case the link dies):

soldiers = [('Steve', 'Miller'), ('Stacy', 'Markov'), ('Sonya', 'Matthews')]
for _, last_name in soldiers:
    # print the second element
    print(last_name)

outputs:

Miller
Markov
Matthews
Matt Messersmith
  • 9,979
  • 2
  • 39
  • 42