942

I've got a list of Python objects that I'd like to sort by an attribute of the objects themselves. The list looks like:

>>> ut
[<Tag: 128>, <Tag: 2008>, <Tag: <>, <Tag: actionscript>, <Tag: addresses>,
 <Tag: aes>, <Tag: ajax> ...]

Each object has a count:

>>> ut[1].count
1L

I need to sort the list by number of counts descending.

I've seen several methods for this, but I'm looking for best practice in Python.

jpp
  • 134,728
  • 29
  • 196
  • 240
Nick Sergeant
  • 29,184
  • 12
  • 34
  • 44
  • 2
    Dupe: http://stackoverflow.com/questions/157424/python-2-5-dictionary-2-key-sort, http://stackoverflow.com/questions/222752/sorting-a-tuple-that-contains-tuples, http://stackoverflow.com/questions/327191/in-python-is-there-a-one-line-pythonic-way-to-get-a-list-of-keys-from-a-dictiona – S.Lott Dec 31 '08 at 18:23
  • 2
    [Sorting HOW TO](https://docs.python.org/3/howto/sorting.html) for those who are looking for more info about sorting in Python. – Jeyekomon May 30 '18 at 08:48
  • 1
    apart from operator.attrgetter('attribute_name') you can also use functors as key like object_list.sort(key=my_sorting_functor('my_key')), leaving the implementation out intentionally. – vijay shanker Apr 08 '19 at 20:07

8 Answers8

1515
# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)

# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)

More on sorting by keys.

Dorian Turba
  • 1,465
  • 1
  • 14
  • 21
Triptych
  • 188,472
  • 32
  • 145
  • 168
  • 2
    No problem. btw, if muhuk is right and it's a list of Django objects, you should consider his solution. However, for the general case of sorting objects, my solution is probably best practice. – Triptych Dec 31 '08 at 17:12
  • 57
    On large lists you will get better performance using operator.attrgetter('count') as your key. This is just an optimized (lower level) form of the lambda function in this answer. – David Eyk Dec 31 '08 at 19:35
  • 6
    Thanks for the great answer. In case if it is a list of dictionaries and 'count' is one of its key then it needs to be changed like below : ut.sort(key=lambda x: x['count'], reverse=True) – dganesh2002 Dec 08 '16 at 21:20
  • 1
    I suppose it deserves the following update: if there is a need to sort by multiple fields, it could be achieved by consecutive calls to sort(), because python is using stable sort algorithm. – zzz777 Feb 23 '20 at 14:41
  • I am receiving this error, can someone add in answer how to resolve it? ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() – mattsmith5 Apr 01 '21 at 07:38
98

A way that can be fastest, especially if your list has a lot of records, is to use operator.attrgetter("count"). However, this might run on an pre-operator version of Python, so it would be nice to have a fallback mechanism. You might want to do the following, then:

try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda

ut.sort(key=keyfun, reverse=True) # sort in-place
tzot
  • 81,264
  • 25
  • 129
  • 197
  • 7
    Here I would use the variable name "keyfun" instead of "cmpfun" to avoid confusion. The sort() method does accept a comparison function through the cmp= argument as well. – akaihola Jan 02 '09 at 12:16
  • This doesn't seems to work if the object has dynamically added attributes, (if you've done `self.__dict__ = {'some':'dict'}` after the `__init__` method). I don't know why it sould be different, though. – tutuca Jan 07 '13 at 20:40
  • @tutuca: I've never replaced the instance `__dict__`. Note that "an object having dynamically added attributes" and "setting an object's `__dict__` attribute" are almost orthogonal concepts. I'm saying that because your comment seems to imply that setting the `__dict__` attribute is a requirement for dynamically adding attributes. – tzot Jan 09 '13 at 23:14
  • @tzot: I'm looking right at this: https://github.com/stochastic-technologies/goatfish/blob/master/goatfish/models.py#L168 and using that iterator here: https://github.com/TallerTechnologies/dishey/blob/master/app.py#L28 raises attribute error. Maybe because of python3, but still... – tutuca Jan 10 '13 at 04:06
  • @tutuca: I would do `self.__dict__.update(kwargs)` instead of `self.__dict__= kwargs`. In any case, perhaps it's a Python 3 issue, since 2.7.3 seems to [run it ok](http://pastebin.com/iWRfz2Cu). I will investigate with Python 3 some time later. – tzot Jan 10 '13 at 21:26
  • And then there's [this](http://stackoverflow.com/questions/2137772/#comment-2078476), which could suggest it's the class Model's metaclass that's at fault here. – tzot Jan 10 '13 at 21:33
  • @tzot, it's not django related, the goatfish Meta attribute is just a raw object with no magic whatsoever... I've tested it in a python 2.7 project and seems to work as expected. I'll need to read further on the issue... – tutuca Jan 14 '13 at 15:21
  • 1
    @tzot: if I understand the use of `operator.attrgetter`, I could supply a function with any property name and return a sorted collection. – IAbstract Feb 24 '16 at 18:16
  • For those looking for more info: https://wiki.python.org/moin/HowTo/Sorting#Operator_Module_Functions – alxs Jan 20 '17 at 15:01
76

Readers should notice that the key= method:

ut.sort(key=lambda x: x.count, reverse=True)

is many times faster than adding rich comparison operators to the objects. I was surprised to read this (page 485 of "Python in a Nutshell"). You can confirm this by running tests on this little program:

#!/usr/bin/env python
import random

class C:
    def __init__(self,count):
        self.count = count

    def __cmp__(self,other):
        return cmp(self.count,other.count)

longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]

longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs

My, very minimal, tests show the first sort is more than 10 times slower, but the book says it is only about 5 times slower in general. The reason they say is due to the highly optimizes sort algorithm used in python (timsort).

Still, its very odd that .sort(lambda) is faster than plain old .sort(). I hope they fix that.

Steven Rumbalski
  • 39,949
  • 7
  • 78
  • 111
Jose M Vidal
  • 8,000
  • 5
  • 40
  • 46
  • 2
    Defining `__cmp__` is equivalent to calling `.sort(cmp=lambda)`, not `.sort(key=lambda)`, so it isn't odd at all. – tzot Sep 09 '19 at 07:46
  • @tzot is exactly right. The first sort has to compare objects against each other again and again. The second sort accesses each object only once to extract its count value, and then it performs a simple numerical sort which is highly optimized. A more fair comparison would be `longList2.sort(cmp = cmp)`. I tried this out and it performed nearly the same as `.sort()`. (Also: note that the "cmp" sort parameter was removed in Python 3.) – Bryan Roach Oct 29 '19 at 04:56
  • __cmp__ was deprecated in Python 3: https://docs.python.org/3/howto/sorting.html#the-old-way-using-the-cmp-parameter – neves Feb 02 '21 at 22:40
63

Object-oriented approach

It's good practice to make object sorting logic, if applicable, a property of the class rather than incorporated in each instance the ordering is required.

This ensures consistency and removes the need for boilerplate code.

At a minimum, you should specify __eq__ and __lt__ operations for this to work. Then just use sorted(list_of_objects).

class Card(object):

    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

    def __eq__(self, other):
        return self.rank == other.rank and self.suit == other.suit

    def __lt__(self, other):
        return self.rank < other.rank

hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand]  # [10, 2, 12, 13, 14]

hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted]  # [2, 10, 12, 13, 14]
jpp
  • 134,728
  • 29
  • 196
  • 240
  • 3
    That's what I was looking for! Could you point us to some documentation that elaborates on why `__eq__` and `__lt__` are the minimum implementation requirements? – FriendFX Aug 07 '19 at 00:23
  • 4
    @FriendFX, I believe it's implied by [this](https://docs.python.org/3/howto/sorting.html#odd-and-ends): `•The sort routines are guaranteed to use __lt__() when making comparisons between two objects...` – jpp Aug 07 '19 at 08:04
  • 2
    @FriendFX: See https://portingguide.readthedocs.io/en/latest/comparisons.html for Comparison and Sorting – Cornel Masson Feb 19 '20 at 10:30
38
from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)
16

It looks much like a list of Django ORM model instances.

Why not sort them on query like this:

ut = Tag.objects.order_by('-count')
muhuk
  • 14,633
  • 7
  • 50
  • 93
  • It is, but using django-tagging, so I was using a built-in for grabbing a Tag set by usage for a particular query set, like so: Tag.objects.usage_for_queryset(QuerySet, counts=True) – Nick Sergeant Dec 31 '08 at 17:39
11

Add rich comparison operators to the object class, then use sort() method of the list.
See rich comparison in python.


Update: Although this method would work, I think solution from Triptych is better suited to your case because way simpler.

rob
  • 33,487
  • 2
  • 52
  • 61
6

If the attribute you want to sort by is a property, then you can avoid importing operator.attrgetter and use the property's fget method instead.

For example, for a class Circle with a property radius we could sort a list of circles by radii as follows:

result = sorted(circles, key=Circle.radius.fget)

This is not the most well-known feature but often saves me a line with the import.

Georgy
  • 6,348
  • 7
  • 46
  • 58