14

I wish to sort the below list first by the number, then by the text.

lst = ['b-3', 'a-2', 'c-4', 'd-2']

# result:
# ['a-2', 'd-2', 'b-3', 'c-4']

Attempt 1

res = sorted(lst, key=lambda x: (int(x.split('-')[1]), x.split('-')[0]))

I was not happy with this since it required splitting a string twice, to extract the relevant components.

Attempt 2

I came up with the below solution. But I am hoping there is a more succinct solution via Pythonic lambda statements.

def sorter_func(x):
    text, num = x.split('-')
    return int(num), text

res = sorted(lst, key=sorter_func)

I looked at Understanding nested lambda function behaviour in python but couldn't adapt this solution directly. Is there a more succinct way to rewrite the above code?

jpp
  • 134,728
  • 29
  • 196
  • 240
  • 3
    Attempt 2 would be my preferred solution, at least until [PEP-572](https://www.python.org/dev/peps/pep-0572/) or something similar is adopted. (Even then, it would appear to allow something like `key=lambda x: (int((x.split('-') as y)[0]), y[1])`, which I'm not a fan of.) – chepner Apr 05 '18 at 12:27
  • @chepner, I agree. Made sure it was the first point in my answer! – jpp Apr 05 '18 at 12:31
  • Actually, I'm not even sure PEP-572 would work here; `y` might be local to the call to `int`, not the tuple... ick. – chepner Apr 05 '18 at 12:32
  • Personally I find Attempt 2 more readable and pythonic than any other solution so far suggested; unless performance is critical anything else might be micro-optimization (although a good benchmark could convince me otherwise) – Chris_Rands May 22 '18 at 21:30
  • 2
    Also you should consider for this specific problem, if 2 criteria are necessary, for example if the numbers are integers in the range 1-9 as in your example, then you can just do a reversed lexographic sort like `sorted(lst, key= lambda x: x[::-1])` – Chris_Rands May 22 '18 at 21:31
  • @Chris_Rands, Yep I agree. One of the reasons for this bounty is bringing attention to [PEP 572 -- Assignment Expressions](https://www.python.org/dev/peps/pep-0572/). The consensus seems to be, "Don't go there." – jpp May 22 '18 at 21:31
  • Agree PEP 572 is a very interesting prospect, I actually wrote a Q&A on this, but deleted it since its only a proposal and not accepted (yet) for Python 3.8 https://stackoverflow.com/questions/50297704/syntax-and-assignment-expressions-what-and-why/ – Chris_Rands May 22 '18 at 21:34

8 Answers8

17

There are 2 points to note:

  • One-line answers are not necessarily better. Using a named function is likely to make your code easier to read.
  • You are likely not looking for a nested lambda statement, as function composition is not part of the standard library (see Note #1). What you can do easily is have one lambda function return the result of another lambda function.

Therefore, the correct answer can found in Lambda inside lambda.

For your specific problem, you can use:

res = sorted(lst, key=lambda x: (lambda y: (int(y[1]), y[0]))(x.split('-')))

Remember that lambda is just a function. You can call it immediately after defining it, even on the same line.

Note #1: The 3rd party toolz library does allow composition:

from toolz import compose

res = sorted(lst, key=compose(lambda x: (int(x[1]), x[0]), lambda x: x.split('-')))

Note #2: As @chepner points out, the deficiency of this solution (repeated function calls) is one of the reasons why PEP-572 is considered implemented in Python 3.8.

jpp
  • 134,728
  • 29
  • 196
  • 240
  • 1
    Happy to see the solution and really appreciate your dedication. – Austin Apr 05 '18 at 12:13
  • The biggest problem with the nested lambda is that it incurs the overhead of an additional (user-defined) function call, which could be significant for large lists. – chepner Apr 05 '18 at 13:23
  • @chepner, very good point. I'm happy to see alternatives. I'm not convinced (yet) that it's worse than the other one-liner alternatives. – jpp Apr 05 '18 at 13:23
  • 1
    No argument there :) The deficiencies of *all* the in-line solutions is why PEP-572 is being considered. – chepner Apr 05 '18 at 13:31
  • @chepner, I think `toolz.compose` has a nice solution to this to avoid the repeated function calls. Pity not in standard library. – jpp Apr 06 '18 at 08:50
  • 1
    I'd use `methodcaller("split", "-")` in place of the second lambda though. – chepner Apr 06 '18 at 12:16
6

We can wrap the list returned by split('-') under another list and then we can use a loop to handle it:

# Using list-comprehension
>>> sorted(lst, key=lambda x: [(int(num), text) for text, num in [x.split('-')]])
['a-2', 'd-2', 'b-3', 'c-4']
# Using next()
>>> sorted(lst, key=lambda x: next((int(num), text) for text, num in [x.split('-')]))
['a-2', 'd-2', 'b-3', 'c-4']
Ashwini Chaudhary
  • 217,951
  • 48
  • 415
  • 461
  • Glad you posted this, it was one of my intermediary attempts! But then I thought all this list creation can't necessarily be good. – jpp Apr 05 '18 at 12:21
  • 1
    @jpp We can reduce one list creation by replacing LC with `next()` call similar to your version. – Ashwini Chaudhary Apr 05 '18 at 12:28
3

In almost all cases I would simply go with your second attempt. It's readable and concise (I would prefer three simple lines over one complicated line every time!) - even though the function name could be more descriptive. But if you use it as local function that's not going to matter much.

You also have to remember that Python uses a key function, not a cmp (compare) function. So to sort an iterable of length n the key function is called exactly n times, but sorting generally does O(n * log(n)) comparisons. So whenever your key-function has an algorithmic complexity of O(1) the key-function call overhead isn't going to matter (much). That's because:

O(n*log(n)) + O(n)   ==  O(n*log(n))

There's one exception and that's the best case for Pythons sort: In the best case the sort only does O(n) comparisons but that only happens if the iterable is already sorted (or almost sorted). If Python had a compare function (and in Python 2 there really was one) then the constant factors of the function would be much more significant because it would be called O(n * log(n)) times (called once for each comparison).

So don't bother about being more concise or making it much faster (except when you can reduce the big-O without introducing too big constant factors - then you should go for it!), the first concern should be readability. So you should really not do any nested lambdas or any other fancy constructs (except maybe as exercise).

Long story short, simply use your #2:

def sorter_func(x):
    text, num = x.split('-')
    return int(num), text

res = sorted(lst, key=sorter_func)

By the way, it's also the fastest of all proposed approaches (although the difference isn't much):

enter image description here

Summary: It's readable and fast!

Code to reproduce the benchmark. It requires simple_benchmark to be installed for this to work (Disclaimer: It's my own library) but there are probably equivalent frameworks to do this kind of task, but I'm just familiar with it:

# My specs: Windows 10, Python 3.6.6 (conda)

import toolz
import iteration_utilities as it

def approach_jpp_1(lst):
    return sorted(lst, key=lambda x: (int(x.split('-')[1]), x.split('-')[0]))

def approach_jpp_2(lst):
    def sorter_func(x):
        text, num = x.split('-')
        return int(num), text
    return sorted(lst, key=sorter_func)

def jpp_nested_lambda(lst):
    return sorted(lst, key=lambda x: (lambda y: (int(y[1]), y[0]))(x.split('-')))

def toolz_compose(lst):
    return sorted(lst, key=toolz.compose(lambda x: (int(x[1]), x[0]), lambda x: x.split('-')))

def AshwiniChaudhary_list_comprehension(lst):
    return sorted(lst, key=lambda x: [(int(num), text) for text, num in [x.split('-')]])

def AshwiniChaudhary_next(lst):
    return sorted(lst, key=lambda x: next((int(num), text) for text, num in [x.split('-')]))

def PaulCornelius(lst):
    return sorted(lst, key=lambda x: tuple(f(a) for f, a in zip((int, str), reversed(x.split('-')))))

def JeanFrançoisFabre(lst):
    return sorted(lst, key=lambda s : [x if i else int(x) for i,x in enumerate(reversed(s.split("-")))])

def iteration_utilities_chained(lst):
    return sorted(lst, key=it.chained(lambda x: x.split('-'), lambda x: (int(x[1]), x[0])))

from simple_benchmark import benchmark
import random
import string

funcs = [
    approach_jpp_1, approach_jpp_2, jpp_nested_lambda, toolz_compose, AshwiniChaudhary_list_comprehension,
    AshwiniChaudhary_next, PaulCornelius, JeanFrançoisFabre, iteration_utilities_chained
]

arguments = {2**i: ['-'.join([random.choice(string.ascii_lowercase),
                              str(random.randint(0, 2**(i-1)))]) 
                    for _ in range(2**i)] 
             for i in range(3, 15)}

b = benchmark(funcs, arguments, 'list size')

%matplotlib notebook
b.plot_difference_percentage(relative_to=approach_jpp_2)

I took the liberty to include a function composition approach of one of my own libraries iteration_utilities.chained:

from iteration_utilities import chained
sorted(lst, key=chained(lambda x: x.split('-'), lambda x: (int(x[1]), x[0])))

It's quite fast (2nd or 3rd place) but still slower than using your own function.


Note that the key overhead would be more significant if you used a function that had O(n) (or better) algorithmic complexity, for example min or max. Then the constant factors of the key-function would be more significant!

MSeifert
  • 118,681
  • 27
  • 271
  • 293
  • 1
    Great answer, thanks for adding the performance dimension, something I hadn't considered. Moral of the story: as soon as you require more than a simple `lambda`, use an explicit function instead! – jpp Jul 07 '18 at 08:36
2
lst = ['b-3', 'a-2', 'c-4', 'd-2']
res = sorted(lst, key=lambda x: tuple(f(a) for f, a in zip((int, str), reversed(x.split('-')))))
print(res)

['a-2', 'd-2', 'b-3', 'c-4']
Paul Cornelius
  • 5,918
  • 1
  • 9
  • 17
1

you could convert to integer only if the index of the item is 0 (when reversing the splitted list). The only object (besides the result of split) which is created is the 2-element list used for comparison. The rest are just iterators.

sorted(lst,key = lambda s : [x if i else int(x) for i,x in enumerate(reversed(s.split("-")))])

As an aside, the - token isn't particularly great when numbers are involved, because it complicates the use of negative numbers (but can be solved with s.split("-",1)

Jean-François Fabre
  • 126,787
  • 22
  • 103
  • 165
0
lst = ['b-3', 'a-2', 'c-4', 'd-2']
def xform(l):
    return list(map(lambda x: x[1] + '-' + x[0], list(map(lambda x: x.split('-'), lst))))
lst = sorted(xform(lst))
print(xform(lst))

See it here I think @jpp has a better solution, but a fun little brainteaser :-)

JGFMK
  • 7,107
  • 4
  • 46
  • 80
0

In general with FOP ( functional oriented programming ) you can put it all in one liner and nest lambdas within one-liners but that is in general bad etiquette, since after 2 nesting function it all becomes quite unreadable.

The best way to approach this kind of issue is to split it up in several stages:

1: splitting string into tuple:

lst = ['b-3', 'a-2', 'c-4', 'd-2']
res = map( lambda str_x: tuple( str_x.split('-') ) , lst)   

2: sorting elements like you wished :

lst = ['b-3', 'a-2', 'c-4', 'd-2']
res = map( lambda str_x: tuple( str_x.split('-') ) , lst)  
res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) ) 

Since we split the string into tuple it will return an map object that will be represented as list of tuples. So now the 3rd step is optional:

3: representing data as you inquired:

lst = ['b-3', 'a-2', 'c-4', 'd-2']
res = map( lambda str_x: tuple( str_x.split('-') ) , lst)  
res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) ) 
res = map( '-'.join, res )  

Now have in mind that lambda nesting could produce a more one-liner solution and that you can actually embed a non discrete nesting type of lambda like follows:

a = ['b-3', 'a-2', 'c-4', 'd-2']
resa = map( lambda x: x.split('-'), a)
resa = map( lambda x: ( int(x[1]),x[0]) , a) 
# resa can be written as this, but you must be sure about type you are passing to lambda 
resa = map( lambda x: tuple( map( lambda y: int(y) is y.isdigit() else y , x.split('-') ) , a)  

But as you can see if contents of list a arent anything but 2 string types separated by '-' , lambda function will raise an error and you will have a bad time figuring what the hell is happening.


So in the end, i would like to show you several ways the 3rd step program could be written:

1:

lst = ['b-3', 'a-2', 'c-4', 'd-2']
res = map( '-'.join,\
             sorted(\ 
                  map( lambda str_x: tuple( str_x.split('-') ) , lst),\
                       key=lambda x: ( int(x[1]), x[0] )\
              )\
         )

2:

lst = ['b-3', 'a-2', 'c-4', 'd-2']
res = map( '-'.join,\
        sorted( map( lambda str_x: tuple( str_x.split('-') ) , lst),\
                key=lambda x: tuple( reversed( tuple(\
                            map( lambda y: int(y) if y.isdigit() else y ,x  )\
                        )))\
            )\
    )  # map isn't reversible

3:

res = sorted( lst,\
             key=lambda x:\
                tuple(reversed(\
                    tuple( \
                        map( lambda y: int(y) if y.isdigit() else y , x.split('-') )\
                    )\
                ))\
            )

So you can see how this all can get very complicated and incomprehensible. When reading my own or someone else's code i often love to see this version:

res = map( lambda str_x: tuple( str_x.split('-') ) , lst) # splitting string 
res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) ) # sorting for each element of splitted string
res = map( '-'.join, res ) # rejoining string  

That is all from me. Have fun. I've tested all code in py 3.6.


PS. In general, you have 2 ways to approach lambda functions:

mult = lambda x: x*2  
mu_add= lambda x: mult(x)+x #calling lambda from lambda

This way is useful for typical FOP,where you have constant data , and you need to manipulate each element of that data. But if you need to resolve list,tuple,string,dict in lambda these kind of operations aren't very useful, since if any of those container/wrapper types is present the data type of elements inside containers becomes questionable. So we would need to go up a level of abstraction and determine how to manipulate data per its type.

mult_i = lambda x: x*2 if isinstance(x,int) else 2 # some ternary operator to make our life easier by putting if statement in lambda 

Now you can use another type of lambda function:

int_str = lambda x: ( lambda y: str(y) )(x)*x # a bit of complex, right?  
# let me break it down. 
#all this could be written as: 
str_i = lambda x: str(x) 
int_str = lambda x: str_i(x)*x 
## we can separate another function inside function with ()
##because they can exclude interpreter to look at it first, then do the multiplication  
# ( lambda x: str(x)) with this we've separated it as new definition of function  
# ( lambda x: str(x) )(i) we called it and passed it i as argument.  

Some people call this type of syntax as nested lambdas, i call it indiscreet since you can see all.

And you can use recursive lambda assignment:

def rec_lambda( data, *arg_lambda ):  
    # filtering all parts of lambda functions parsed as arguments 
    arg_lambda = [ x for x in arg_lambda if type(x).__name__ == 'function' ]  

    # implementing first function in line
    data = arg_lambda[0](data)  

    if arg_lambda[1:]: # if there are still elements in arg_lambda 
        return rec_lambda( data, *arg_lambda[1:] ) #call rec_lambda
    else: # if arg_lambda is empty or []
        return data # returns data  

#where you can use it like this  
a = rec_lambda( 'a', lambda x: x*2, str.upper, lambda x: (x,x), '-'.join) 
>>> 'AA-AA' 
Danilo
  • 933
  • 9
  • 29
-3

I think* if you are certain the format is consistently "[0]alphabet [1]dash" following indexes beyond [2:] will always be number, then you can replace split with slice, or you can use str.index('-')

sorted(lst, key=lambda x:(int(x[2:]),x[0]))

# str.index('-') 
sorted(lst, key=lambda x:(int(x[x.index('-')+1 :]),x[0])) 
guramarx
  • 25
  • 5