4

I have a group of items that are labeled like item_labels = [('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]

I want to sort them by the size of group. e.g., label 3 has size 3 and label 2 has size 2 in the above example.

I tried using a combination of groupby and sorted but didn't work.

In [162]: sil = sorted(item_labels, key=op.itemgetter(1))

In [163]: sil
Out[163]: [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]

In [164]: g = itt.groupby(sil,)
Display all 465 possibilities? (y or n)

In [164]: g = itt.groupby(sil, key=op.itemgetter(1))

In [165]: for k, v in g:
   .....:     print k, list(v)
   .....:
   .....:
1 [('c', 1)]
2 [('b', 2), ('e', 2)]
3 [('a', 3), ('d', 3), ('f', 3)]

In [166]: sg = sorted(g, key=lambda x: len(list(x[1])))

In [167]: sg
Out[167]: [] # not exactly know why I got an empty list here

I can always write some tedious for-loop to do this, but I would rather find something more elegant. Any suggestion? If there are libraries that are useful I would happy to use that. e.g., pandas, scipy

jfs
  • 346,887
  • 152
  • 868
  • 1,518
clwen
  • 16,956
  • 27
  • 70
  • 91

5 Answers5

3

In python2.7 and above, use Counter:

from collections import Counter
c = Counter(y for _, y in item_labels)
item_labels.sort(key=lambda t : c[t[1]])

In python2.6, for our purpose, this Counter constructor can be implemented using defaultdict (as suggested by @perreal) this way:

from collections import defaultdict
def Counter(x):
    d = defaultdict(int)
    for v in x: d[v]+=1
    return d

Since we are working with numbers only, and assuming the numbers are as low as those in your example, we can actually use a list (which will be compatible with even older version of Python):

def Counter(x):
    lst = list(x)
    d = [0] * (max(lst)+1)
    for v in lst: d[v]+=1
    return d

Without counter, you can simply do this:

item_labels.sort(key=lambda t : len([x[1] for x in item_labels if x[1]==t[1] ]))

It is slower, but reasonable over short lists.


The reason you've got an empty list is that g is a generator. You can only iterate over it once.

Elazar
  • 16,891
  • 2
  • 40
  • 64
  • Unfortunately I'm using python 2.6 so can't really use `Counter`. – clwen Jun 24 '13 at 21:45
  • Thanks. This line `item_labels.sort(key=lambda t : c[t[0]])` should be `item_labels.sort(key=lambda t : c[t[1]])`? – clwen Jun 25 '13 at 06:43
3
from collections import defaultdict
import operator
l=[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
d=defaultdict(int)
for p in l: d[p[1]] += 1
print [ p for i in sorted(d.iteritems(), key=operator.itemgetter(1))
        for p in l if p[1] == i[1] ]
perreal
  • 85,397
  • 16
  • 134
  • 168
  • You are effectively implementing a `Counter` using `defaultdict` – Elazar Jun 24 '13 at 21:59
  • @Elazar: Which is actually faster than default dict in many cases. Try it. +1 – dawg Jun 24 '13 at 22:09
  • @drewk: [`defaultdict` can be faster than `Counter`](http://stackoverflow.com/a/2525617/4279) though it doesn't matter in this case – jfs Jun 24 '13 at 22:19
2

itertools.groupby returns an iterator, so this for loop: for k, v in g: actually consumed that iterator.

>>> it = iter([1,2,3])
>>> for x in it:pass
>>> list(it)          #iterator already consumed by the for-loop
[]

code:

>>> lis = [('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]
>>> from operator import itemgetter
>>> from itertools import groupby
>>> lis.sort(key = itemgetter(1) )
>>> new_lis = [list(v) for k,v in groupby(lis, key = itemgetter(1) )]
>>> new_lis.sort(key = len)
>>> new_lis
[[('c', 1)], [('b', 2), ('e', 2)], [('a', 3), ('d', 3), ('f', 3)]]

To get a flattened list use itertools.chain:

>>> from itertools import chain
>>> list( chain.from_iterable(new_lis))
[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
Ashwini Chaudhary
  • 217,951
  • 48
  • 415
  • 461
2

Same as @perreal's and @Elazar's answers, but with better names:

from collections import defaultdict

size = defaultdict(int)
for _, group_id in item_labels:
   size[group_id] += 1

item_labels.sort(key=lambda (_, group_id): size[group_id])
print item_labels
# -> [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
Community
  • 1
  • 1
jfs
  • 346,887
  • 152
  • 868
  • 1,518
1

Here is another way:

example=[('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]

out={}
for t in example:
    out.setdefault(t[1],[]).append(t)

print sorted(out.values(),key=len)

Prints:

[[('c', 1)], [('b', 2), ('e', 2)], [('a', 3), ('d', 3), ('f', 3)]]

If you want a flat list:

print [l for s in sorted(out.values(),key=len) for l in s]
[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
dawg
  • 80,841
  • 17
  • 117
  • 187