1

I'm teaching myself Python from a book and I'm stuck on a programming exercise. The aim of the exercise is to make a list of objects, and then sort those objects based on some object attribute.

The author of my textbook says that using a key to call a class method for every comparison during sorting can slow down sort time for large datasets and that building a 'decorated' list by calling the class method just once for each object as you build the list can make subsequent sorting more efficient. The programming problem stipulates that, instead of a key, I should make a series of tuples, wherein tuple element 0 is the literal value of an object attribute, and tuple element 1 is the object itself. After I have my decorated list, I can use a built-in list sort method (e.g. "sorted()" or ".sort()", etc.) to put it all in order.

I get an error when two objects have an identical sort attribute value. This example code will reproduce the error:

class Shrubber:
    def __init__(self, age):
        self.name = 'Roger'
        self.age = age

    def getAge(self):
        return self.age

def main():
    rogerAges = [30, 21, 21, 25]
    rogers = []
    for rAge in rogerAges:
        newShr = Shrubber(rAge)
        rogers.append((newShr.getAge(), newShr))
    rogers.sort()
    print(rogers)

main()

I would like the program to print something like this:

[(21, <__main__.Shrubber object at XxXEX>), (21, <__main__.Shrubber object at YxYEY>), (25, <__main__.Shrubber object at ZxZEZ>), (30, <__main__.Shrubber object at QxQEQ>)]

...but instead, it gives me a TypeError:

TypeError: unorderable types: Shrubber() < Shrubber()

I'm sure I'm getting the error because, after Python encounters two identical values for two tuple elements 0, it looks to elements 1 and finds an unorderable data type (an object). However, the limitation that two Shrubbers can't be the same age makes it seem like I'm missing something.

My question: Can I stably sort my tuple list by tuple element 0 and ignore my unorderable tuple element 1?

mRotten
  • 407
  • 4
  • 11
  • 1
    Instead of a *key*? Are you sure it doesn't talk about a *comparator*? One of the reasons key functions were introduced was to optimize exactly what you're talking about. – user2357112 supports Monica Mar 11 '14 at 05:54
  • 1
    Works for me on python 2.7 – skamsie Mar 11 '14 at 06:40
  • Yeah, he said not to use a key, but to create tuples with the literal value of the orderable attribute as element 0 for each 2-element tuple. He doesn't cover comparators in the book, insofar as they haven't been introduced yet and there's no entry in the textbook index. Using Python 3.3. – mRotten Mar 11 '14 at 06:49
  • 1
    That... doesn't make sense. The key function is only evaluated once for every list item. It's just like if you had used decorate-sort-undecorate, but faster and cleaner. The only case I can think of where it might be faster to use a decorated list is if you're going to be modifying and re-sorting the list repeatedly, but even then, if the key function just fetches an attribute, the cost should be trivial. – user2357112 supports Monica Mar 11 '14 at 07:04
  • John Zelle (the author) says: "One disadvantage of passing a function to the list sort method is that it makes the sorting slower, since this function is called repeatedly as Python needs to compare various items. An alternative to creating a special key function is to create a "decorated" list that will sort in the desired order using the standard Python ordering." JZ then goes on to describe the tuple scheme I mentioned. – mRotten Mar 11 '14 at 07:14
  • Is this your original code? It works perfectly for me (Python 2.7.5) – Matthias Mar 11 '14 at 07:56
  • @Matthias - Yeah, it's my original code. Interesting, the code above gives an error in Python 3.3 just as it is written above. Herr Actress also said it works for her/him on 2.7. – mRotten Mar 11 '14 at 08:11
  • @user3103237 Yes, but he is talking about the `cmp` parameter. You are reinventing the wheel because what you just wrote is already done better using the `key` parameter: `rogers = sorted([Shrubber(age) for age in rogerAges], key=lambda x: x.getAge())`. Note that this calls `getAge` only *once* per item. – Bakuriu Mar 11 '14 at 09:07
  • @user3103237 The code doesn't raise an error in python2 because in python2 there is a default comparison for every object. It sorts by class name and eventually by id. For example `1 < "hello"` because `int` comes before `str`. In python3 this is not the case anymore and if you really what to do what you are trying to do (without using `key`) you **must** implement `__lt__`. – Bakuriu Mar 11 '14 at 09:07
  • Thanks everyone for your answers. – mRotten Mar 11 '14 at 16:35
  • @Bakuriu - I'm working out of an academic textbook, so I'm supposed to be reinventing the wheel, to some extent. The built-in comparisons being newly absent makes a lot of sense though, explains why my code works in Py2.7 and not 3.3. – mRotten Mar 11 '14 at 16:37
  • @Bakuriu: You are probably right that using "__lt__" might be my best option. Parenthetically, is it really true that using getAge as a key only calls getAge once per item? – mRotten Mar 11 '14 at 16:39
  • It seems like it would be called for both items in each pairwise comparison. In the simplest case, the ages of three Shrubbers A, B and C would require at least two pairwise comparisons (A with B or C, and B with A or C). getAge would need to be called once for each Shrubbers A and B (or C), and then once for each Shrubbers A and C (or B). At minimum, getAge would have to be called twice for at least one Shrubber. Is that correct? – mRotten Mar 11 '14 at 16:40

2 Answers2

1

I don't know a way to make it ignore the second element in these cases. An alternative is to add a method to the object's class that will return the needed information. Doing so will avoid needing to passsort()a key= function. Here's an example of what I mean:

class Shrubber:
    def __init__(self, age):
        self.name = 'Roger'
        self.age = age

    def getAge(self):
        return self.age

    def __lt__(self, other):  # added comparison method
        return self.age < other.age

def main():
    rogerAges = [30, 21, 21, 25]
    rogers = []
    for rAge in rogerAges:
        newShr = Shrubber(rAge)
        rogers.append((newShr.getAge(), newShr))
    rogers.sort()
    print(rogers)

if __name__ == '__main__':
    main()

Output (wrapped for readability):

[(21, <__main__.Shrubber object at 0x00C1D830>),
 (21, <__main__.Shrubber object at 0x00C1D9D0>),
 (25, <__main__.Shrubber object at 0x00C1DA30>),
 (30, <__main__.Shrubber object at 0x00C1D9F0>)]
martineau
  • 99,260
  • 22
  • 139
  • 249
  • Thanks! That is quite clear, in terms of how I write the code, but I'll have to learn a little more about how "__lt__" and similar functions work. – mRotten Mar 11 '14 at 16:42
  • 1
    The [documentation](https://docs.python.org/3/library/stdtypes.html?highlight=sort#list.sort) for the `list.sort()` method says it sorts the list using only ` – martineau May 14 '14 at 20:48
  • Yep, I get that now. I had never heard of rich comparisons before asking this question, but the documentation is very straightforward, and made sense as soon as I realized 1) rich comparison methods aren't called explicitly, but implicitly when an object is compared and 2) rich comparisons are only called when an object is on the appropriate side of the appropriate operator (ie `__lt__` is only called when the object is on the left of ` – mRotten May 15 '14 at 00:42
1

You do not have to implement a __lt__ function if you make sure the comparison never gets through to the Shrubber objects. One way to do that is inserting another integer into that tuple:

def main():
    rogerAges = [30, 21, 21, 25]
    rogers = list()
    for i, rAge in enumerate(rogerAges):
        newShr = Shrubber(rAge)
        rogers.append((newShr.getAge(), i, newShr))
    rogers.sort()
    print(rogers)

However, this is from all points of view wrong, you should use a key= function. This is harder to read and according to my measurements, it is about four times slower. Still, using __lt__ (or the deprecated cmp=) is even worse, about two more times slower.

As a final note, appending to a list this way is not the Pythonic way. You can do this:

def main():
    rogerAges = [30, 21, 21, 25]
    rogers = [(rAge, i, Shrubber(rAge)) for (i, rAge) in enumerate(rogerAges)]
    rogers.sort()
    print(rogers)
emu
  • 1,276
  • 13
  • 19
  • Thanks emu. It makes sense that I would add code to count instances of similar objects - I didn't think of that. – mRotten Dec 29 '14 at 21:21
  • It gets around the rich comparison or lambda functions that weren't covered at that point in the book. Not that it's correct or Pythonic to get around lambda or rich comparisons, I just wanted to know how I could. I'd upvote you, but my mere 13 credibilities do not afford me that priviledge. Parenthetically, the SO reputation system seems a lot like Bart Simpson's rules of the playground to me, so it's also the reason I no longer contribute to SO. – mRotten Dec 29 '14 at 21:40
  • yeah, the rules feel like a weird role-playing game. It is funny at times, but I'm quite sure I can live without these upvote points :) – emu Jan 03 '15 at 10:08