2

I want to lexicographically compare two lists, but the values inside the list should be computed when needed. For instance, for these two lists

a = list([1, 3, 3])
b = list([1, 2, 2])

(a < b) == False
(b < a) == True

I'd like the values in the list to be functions and in the case of a and b, the values (i.e. the function) at index=2 would not be evaluated as the values at index=1 (a[1]==3, b[1]==2) are already sufficient to determine that b < a.

One option would be to manually compare the elements, and that's probably what I will do when I don't find a solution that allows me to use the list's comparator, but I found that the manual loop is a tad slower than the list's builtin comparator which is why I want to make use of it.

Update

Here's a way to accomplish what I am trying to do, but I was wondering if there are any built-in functions that would do this faster (and which makes use of this feature of lists).

def lex_comp(a, b):
  for func_a, func_b in izip(a, b):
    v_a = func_a()
    v_b = func_b()
    if v_a < v_b: return -1
    if v_b > v_a: return +1
  return 0


def foo1(): return 1
def foo2(): return 1

def bar1(): return 1
def bar2(): return 2

def func1(): return ...
def func2(): return ...

list_a = [foo1, bar1, func1, ...]
list_b = [foo2, bar2, func2, ...]

# now you can use the comparator for instance to sort a list of these lists
sort([list_a, list_b], cmp=lex_comp)
orange
  • 6,483
  • 12
  • 52
  • 113
  • What do you mean by "I'd like the values in the list to be functions", so is it finally a evaluated value or a function (with what arguments?)? – YiFei Aug 23 '16 at 01:54
  • Interesting question - perhaps a job for [itertools.takewhile()](https://docs.python.org/2/library/itertools.html#itertools.takewhile)? – FujiApple Aug 23 '16 at 01:54
  • It would help if, instead of showing an example with numbers, you showed an example of how you want this to work with your functions. – BrenBarn Aug 23 '16 at 02:22
  • @KlausD.: I tried `@property`, but that doesn't work and also using a custom class instead of list with `__cmp__` defined, but as I said that was slower... – orange Aug 23 '16 at 02:23
  • @YiFei: I mean that you would have a function without any arguments which gets evaluated before making the comparison, but only when necessary. So instead of `a[2]=3`, you would have `a[2]=foo` with `def foo(): return 3` (in fact some expensive calculation would be done instead of `return 3`) – orange Aug 23 '16 at 02:24
  • @FujiApple: Can you elaborate on your suggestion? How would `itertools. takewhile` be working in this case? – orange Aug 23 '16 at 02:26
  • @orange Wait, you want a parameter-less function? What exactly is the use of the list with respect to the function, then? – juanpa.arrivillaga Aug 23 '16 at 02:27
  • @orange - My solution is similar to the one posted by juanpa below, I'll post it shortly – FujiApple Aug 23 '16 at 02:28

3 Answers3

2

Try this (the extra parameters to the function are just for illustration purposes):

import itertools

def f(a, x):
    print "lazy eval of {}".format(a)
    return x

a = [lambda: f('a', 1), lambda: f('b', 3), lambda: f('c', 3)]
b = [lambda: f('d', 1), lambda: f('e', 2), lambda: f('f', 2)]
c = [lambda: f('g', 1), lambda: f('h', 2), lambda: f('i', 2)]

def lazyCmpList(a, b):
    l = len(list(itertools.takewhile(lambda (x, y): x() == y(), itertools.izip(a, b))))
    if l == len(a):
        return 0
    else:
        return cmp(a[l](), b[l]())

print lazyCmpList(a, b)
print lazyCmpList(b, a)
print lazyCmpList(b, c)

Produces:

lazy eval of a
lazy eval of d
lazy eval of b
lazy eval of e
-1
lazy eval of d
lazy eval of a
lazy eval of e
lazy eval of b
1
lazy eval of d
lazy eval of g
lazy eval of e
lazy eval of h
lazy eval of f
lazy eval of i
0

Note that the code assumes the list of functions are of the same length. It could be enhanced to support non-equal list length, you'd have to define what the logic was i.e. what should cmp([f1, f2, f3], [f1, f2, f3, f1]) produce?

I haven't compared the speed but given your updated code I would imagine any speedup will be marginal (looping done in C code rather than Python). This solution may actually be slower as it is more complex and involved more memory allocation.

Given you are trying to sort a list of functions by evaluating them it follows that the functions will be evaluated i.e. O(nlogn) times and so your best speedup may be to look at using memoization to avoid repeated revaluation of the functions.

Community
  • 1
  • 1
FujiApple
  • 702
  • 5
  • 14
1

Here is an approach that uses lazy evaluation:

>>> def f(x):
...   return 2**x
... 
>>> def g(x):
...   return x*2
... 
>>> [f(x) for x in range(1,10)]
[2, 4, 8, 16, 32, 64, 128, 256, 512]
>>> [g(x) for x in range(1,10)]
[2, 4, 6, 8, 10, 12, 14, 16, 18]
>>> zipped = zip((f(i) for i in range(1,10)),(g(i) for i in range(1,10)))
>>> x,y = next(itertools.dropwhile(lambda t: t[0]==t[1],zipped))
>>> x > y
True
>>> x < y
False
>>> x
8
>>> y
6
>>> 
juanpa.arrivillaga
  • 65,257
  • 7
  • 88
  • 122
  • Don't quite sure about this, but as OP comment to me: "a function without any arguments". – YiFei Aug 23 '16 at 02:27
  • @YiFei Yes, that comment came after this answer. Now I'm just confused :\ – juanpa.arrivillaga Aug 23 '16 at 02:28
  • I don't think this is lazy evaluation. `f(i)` and `g(i)` are evaluated the moment the tuples are created. I would like the evaluation of the function to occur only when the comparison occurs. – orange Aug 23 '16 at 02:28
  • @orange In python3, zip is lazy. It is the equivalent of `izip` in itertools for python 2. – juanpa.arrivillaga Aug 23 '16 at 02:29
  • @orange, Then you have to specify how will the function you mentioned would work, if the function don't have arguments, then it has to return some constant, or do you use any kind of global variables? – YiFei Aug 23 '16 at 02:29
  • @YiFei: Don't worry about the actual values being returned. They are results of some calculations of some arguments, but for the purpose of my question this detail doesn't matter. – orange Aug 23 '16 at 02:39
  • @juanpa.arrivillaga: Ok, in that case this would work. – orange Aug 23 '16 at 02:43
  • I nominate this answer as solution, as it has been one of the fastest provided. I might still do the comparison explicitly. – orange Aug 25 '16 at 01:31
1

I did some testing and found that @juanpa's answer and the version in my update are the fastest versions:

import random
import itertools
import functools

num_rows = 100
data = [[random.randint(0, 2) for i in xrange(10)] for j in xrange(num_rows)]

# turn data values into functions.
def return_func(value):
    return value

list_funcs = [[functools.partial(return_func, v) for v in row] for row in data]


def lazy_cmp_FujiApple(a, b):
    l = len(list(itertools.takewhile(lambda (x, y): x() == y(), itertools.izip(a, b))))
    if l == len(a):
        return 0
    else:
        return cmp(a[l](), b[l]())

sorted1 = sorted(list_funcs, lazy_cmp_FujiApple)
%timeit sorted(list_funcs, lazy_cmp_FujiApple)
# 100 loops, best of 3: 2.77 ms per loop

def lex_comp_mine(a, b):
    for func_a, func_b in itertools.izip(a, b):
        v_a = func_a()
        v_b = func_b()
        if v_a < v_b: return -1
        if v_a > v_b: return +1
    return 0

sorted2 = sorted(list_funcs, cmp=lex_comp_mine)
%timeit sorted(list_funcs, cmp=lex_comp_mine)
# 1000 loops, best of 3: 930 µs per loop

def lazy_comp_juanpa(a, b):
    x, y = next(itertools.dropwhile(lambda t: t[0]==t[1], itertools.izip(a, b)))
    return cmp(x, y)

sorted3 = sorted(list_funcs, cmp=lazy_comp_juanpa)
%timeit sorted(list_funcs, cmp=lex_comp_mine)
# 1000 loops, best of 3: 949 µs per loop

%timeit sorted(data)
# 10000 loops, best of 3: 45.4 µs per loop

# print sorted(data)
# print [[c() for c in row] for row in sorted1]
# print [[c() for c in row] for row in sorted2]
# print sorted3

I guess the creation of an intermediate list is hurting performance of @FujiApple's version. When running my comparator version on the original data list and comparing the runtime to Python's native list sorting, I note that my version is about 10times slower (501 µs vs 45.4 µs per loop). I guess theres' no easy way to get close to the performance of Python's native implementation...

orange
  • 6,483
  • 12
  • 52
  • 113