Find where f(x) changes in a list, with bisection (in Python)

Question

Reasoning: I'm trying to implement, in Python, something similar to git bisect, but with basically a list of directories.

I have a (long) list of version numbers like this: ['1.0', '1.14', '2.3', '3.1', '4']

I have a function works() which takes a version number, and returns a value.

[works(x) for x in my_list] would look like: ['foo', 'foo', 'foo', 'bar', 'bar'] ... but running works() is very expensive.

I would like to do some kind of bisect which will find the change boundary.

Why is this question downvoted? – Willem Van Onsem Feb 08 '17 at 19:57 — Willem Van Onsem, Feb 08 '17 at 19:57

Willem Van Onsem · Accepted Answer · 2017-02-08T18:29:11.767

9

You could simply use binary search:

def binary_f(f,list):
    frm = 0
    to = len(list)
    while frm < to:
        mid = (frm+to)>>1
        if f(list[mid]):
            to = mid
        else:
            frm = mid+1
    return frm

It will return the first index i for which bool(f(list[i])) is True.

Of course the function assumes that the the map of f on the list is of the form:

f(list) == [False,False,...,False,True,True,...,True]

If this is not the case, it will usually find a swap but which one is rather undefined.

Say f is simply "the version is 2 or higher" so lambda v:v >= '2', then it will return:

>>> binary_f(lambda v:v >= '2',['1.0', '1.14', '2.3', '3.1', '4'])
2

So index 2. In case the entire list would return with False objects, it will **return len(list). Since it "assumes" the element just outside the list will be evaluated to True:

>>> binary_f(lambda v:v >= '4.2',['1.0', '1.14', '2.3', '3.1', '4'])
5

Of course in your example f is works.

Experiments:

>>> binary_f(lambda v:v >= '2',['1.0', '1.14', '2.3', '3.1', '4'])
2
>>> binary_f(lambda v:v >= '0',['1.0', '1.14', '2.3', '3.1', '4'])
0
>>> binary_f(lambda v:v >= '1',['1.0', '1.14', '2.3', '3.1', '4'])
0
>>> binary_f(lambda v:v >= '1.13',['1.0', '1.14', '2.3', '3.1', '4'])
1
>>> binary_f(lambda v:v >= '2.4',['1.0', '1.14', '2.3', '3.1', '4'])
3
>>> binary_f(lambda v:v >= '3',['1.0', '1.14', '2.3', '3.1', '4'])
3
>>> binary_f(lambda v:v >= '3.2',['1.0', '1.14', '2.3', '3.1', '4'])
4
>>> binary_f(lambda v:v >= '4.2',['1.0', '1.14', '2.3', '3.1', '4'])
5

(I here of course did a very cheap version check, but it works of course for more sophisticated predicates).

Since this is binary search, it will run in O(log n) with n the number of items in the list whereas linear search can result in O(n) checks (which is usually more expensive).

EDIT: in case the list contains two values and you want to find the swap, you can simply first compute the value for index 0:

val0 = f(list[0])

and then provide binary_f:

binary_f(lambda v:works(v) != val0,list)

Or putting it into a nice function:

def binary_f_val(f,list):
    val0 = f(list[0])
    return binary_f(lambda x:f(x) != val0,list)

edited Feb 08 '17 at 18:29

answered Feb 08 '17 at 17:28

Willem Van Onsem

321,217
26
295
405

Isn't this overkill? list.index() does the same thing, does it not? – ryanskeith Feb 08 '17 at 17:42
1

it does but with `O(n)` speed, so with big lists it's much slower. – Jean-François Fabre Feb 08 '17 at 17:44
@rshield: If you read the question carefully, you see that **`works`** is an expensive operation. It could mean you run an entire testbanch against it. Now imagine there are hundreds of versions you have to check. In that case, this algorithm will perform 7-8 checks, whereas next will on average check 50 cases. If each testbench takes an hour. I would be glad to only need to wait 8 hours instead of half a week. – Willem Van Onsem Feb 08 '17 at 17:44
@WillemVanOnsem I stand corrected. I see why binary is the best solution here. – ryanskeith Feb 08 '17 at 17:53
This is great, but I actually misstated the problem a little bit, I just realized. See update. – Daniel Feb 08 '17 at 18:21
@dmd: that does not change much: you can simply use `lambda x : works(x) == 'bar'` as function. – Willem Van Onsem Feb 08 '17 at 18:25
oh, I see now how I can do this - I should just feed works() neighboring pairs, thus reducing the problem to false/true. – Daniel Feb 08 '17 at 18:25
@WillemVanOnsem but I don't actually know what foo and bar are. but my prev comment stands. – Daniel Feb 08 '17 at 18:25
@dmd: yeah indeed. As long as you somehow can convert it into a predictate (and the predicate swaps once from `False` to `True` it works). – Willem Van Onsem Feb 08 '17 at 18:25
@dmd: you can simply first inspect the borders of the list: look what `val = works(list[0])` is and then feed it like `lambda x:works(x) != val`. – Willem Van Onsem Feb 08 '17 at 18:26
I think there is a module that can do the binary search for you – Copperfield Feb 08 '17 at 18:43
use `bisect` to do that for you. But there's a little catch: a version list like `['1.0', '1.14', '2.3', '3.1', '4', "11.4"]` _appears_ sorted, but insertion fails because numerical sort isn't version sort. For instance `10.1` is inserted in second position... – Jean-François Fabre Feb 08 '17 at 19:54
I wonder if you could reuse the Python [bisect](https://docs.python.org/2/library/bisect.html) module for this. You'd probably have to define a custom class with a `__getitem__` that calls `works()` lazily. – Marius Gedminas Feb 17 '17 at 11:45
@MariusGedminas: you can indeed construct something as a `maplist` that is a "virtual list" and lazily calls the mapping function. If I find some time after hours I will try to add it to the answer. – Willem Van Onsem Feb 17 '17 at 11:46

score 0 · Answer 2 · answered Feb 08 '17 at 17:33

So you basically want to implement binary search algorithm ... this is pretty straight forward, the rough draft of algorithm is below. I haven't tested it, but you should get the idea and take care of edge cases when your version list of length 1 or 2:

def whereWorks(versions, works):

   middle = len(versions)/2

   good = works(versions[middle])

   if middle < 2:
       return good ? 0 : 1

   if works(middle):
         return whereWorks(versions[0:middle])
   else
         return whereWorks(versions[middle:])+middle

Ma0 · Answer 3 · 2017-02-08T17:49:44.107

That is what next() is for.

result = next(x for x in my_list if works(x))

A faster way but a more complicated one would be:

alist = [0,0,0,0,0,0,1]

def check(my_list, tracking=0):

    def criterion(i):
        return bool(i)

    if len(my_list) == 1:
        if my_list[0] == 1:
            return tracking
        else:
            return tracking + 1

    start = len(my_list) // 2

    if criterion(my_list[start]):
        return check(my_list[:start], tracking=tracking)
    else:
        tracking += start + 1
        return check(my_list[start+1:], tracking=tracking)

print(check(alist))  # returns 6

Which is a bisection method. Cuts the list recursively in half, checks the element in the middle and moves the the slice on the left if it is 1 or on the right if it is a 0. The tracking tracks the index. I would love to have a timeit by someone if he\she has the time.

@WillemVanOnsem it is still not binary but operates at log(n) — Ma0, Feb 08 '17 at 18:04

Find where f(x) changes in a list, with bisection (in Python)

3 Answers3