Merge of lazy streams (using generators) in Python

Question

I'm playing with functional capacities of Python 3 and I tried to implement classical algorithm for calculating Hamming numbers. That's the numbers which have as prime factors only 2, 3 or 5. First Hamming numbers are 2, 3, 4, 5, 6, 8, 10, 12, 15, 16, 18, 20 and so on.

My implementation is the following:

def scale(s, m):
    return (x*m for x in s)

def merge(s1, s2):
    it1, it2 = iter(s1), iter(s2)
    x1, x2 = next(it1), next(it2)
    if x1 < x2:
        x = x1
        it = iter(merge(it1, s2))
    elif x1 > x2:
        x = x2
        it = iter(merge(s1, it2))
    else:
        x = x1
        it = iter(merge(it1, it2))
    yield x
    while True: yield next(it)

def integers():
    n = 0
    while True:
        n += 1
        yield n

m2 = scale(integers(), 2)
m3 = scale(integers(), 3)
m5 = scale(integers(), 5)

m23 = merge(m2, m3)

hamming_numbers = merge(m23, m5)

The problem it that merge seems just doesn't work. Before that I implemented Sieve of Eratosthenes the same way, and it worked perfectly okay:

def sieve(s):
    it = iter(s)
    x = next(it)
    yield x
    it = iter(sieve(filter(lambda y: x % y, it)))
    while True: yield next(it)

This one uses the same techniques as my merge operation. So I can't see any difference. Do you have any ideas?

(I know that all of these can be implemented other ways, but my goal exactly to understand generators and pure functional capabilities, including recursion, of Python, without using class declarations or special pre-built Python functions.)

UPD: For Will Ness here's my implementation of this algorithms in LISP (Racket actually):

(define (scale str m)
  (stream-map (lambda (x) (* x m)) str))

(define (integers-from n)
  (stream-cons n
               (integers-from (+ n 1))))

(define (merge s1 s2)
  (let ((x1 (stream-first s1))
        (x2 (stream-first s2)))
    (cond ((< x1 x2)
           (stream-cons x1 (merge (stream-rest s1) s2)))
          ((> x1 x2)
           (stream-cons x2 (merge s1 (stream-rest s2))))
          (else
           (stream-cons x1 (merge (stream-rest s1) (stream-rest s2)))))))


(define integers (integers-from 1))

(define hamming-numbers
  (stream-cons 1 (merge (scale hamming-numbers 2)
                        (merge (scale hamming-numbers 3)
                               (scale hamming-numbers 5)))))

thanks for the code. yeah, that's exactly like the classic Haskell - as e.g. here rosettacode.org/wiki/Hamming_numbers (some other very interesting code there too, also stream-based Scheme code in http://c2.com/cgi/wiki?SieveOfEratosthenesInManyProgrammingLanguages). I guess it's all started with SICP mitpress.mit.edu/sicp/full-text/sicp/book/node71.html anyway. :) — Will Ness, Feb 04 '13 at 10:47
([updated link](https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-24.html#%_sec_3.5.2)) — Will Ness, Jun 28 '19 at 16:46

score 2 · Accepted Answer · edited May 23 '17 at 12:13

Your algorithm is incorrect. Your m2, m3, m5 should be scaling hamming_numbers, not integers.

The major problem is this: your merge() calls next() for both its arguments unconditionally, so both get advanced one step. So after producing the first number, e.g. 2 for the m23 generator, on the next invocation it sees its 1st argument as 4(,6,8,...) and 2nd as 6(,9,12,...). The 3 is already gone. So it always pulls both its arguments, and always returns the head of the 1st (test entry at http://ideone.com/doeX2Q).

Calling iter() is totally superfluous, it adds nothing here. When I remove it (http://ideone.com/7tk85h), the program works exactly the same and produces exactly the same (wrong) output. Normally iter() serves to create a lazy iterator object, but its arguments here are already such generators.

There's no need to call iter() in your sieve() as well (http://ideone.com/kYh7Di). sieve() already defines a generator, and filter() in Python 3 creates an iterator from a function and an iterable (generators are iterable). See also e.g. Difference between Python's Generators and Iterators .

We can do the merge like this, instead:

def merge(s1, s2):
  x1, x2 = next(s1), next(s2)
  while True:
    if x1 < x2:
        yield x1
        x1 = next(s1)
    elif x1 > x2:
        yield x2
        x2 = next(s2)
    else:
        yield x1
        x1, x2 = next(s1), next(s2)

Recursion in itself is non-essential in defining the sieve() function too. In fact it only serves to obscure there an enormous deficiency of that code. Any prime it produces will be tested by all the primes below it in value - but only those below its square root are truly needed. We can fix it quite easily in a non-recursive style (http://ideone.com/Qaycpe):

def sieve(s):    # call as: sieve( integers_from(2))
    x = next(s)  
    yield x
    ps = sieve( integers_from(2))           # independent primes supply
    p = next(ps) 
    q = p*p       ; print((p,q))
    while True:
        x = next(s)
        while x<q: 
            yield x
            x = next(s)
        # here x == q
        s = filter(lambda y,p=p: y % p, s)  # filter creation postponed 
        p = next(ps)                        #   until square of p seen in input
        q = p*p

This is now much, much, much more efficient (see also: Explain this chunk of haskell code that outputs a stream of primes ).

Recursive or not, is just a syntactic characteristic of a code. The actual run-time structures are the same - the filter() adaptors being hoisted on top of an input stream - either at the appropriate moments, or way too soon (so we'd end up with way too many of them).

It works perfectly fine, but you have thrown away recursion, which I want to keep in my solution (the same recursive algorithm works in C++ and LISP, so it has to work somehow in Python too). — Roman Dobrovenskii, Feb 01 '13 at 14:36
@Heller recursive or not, first of all it must be correct. :) — Will Ness, Feb 01 '13 at 15:33
@Heller it'd be great if I could see your Lisp code (I can read C++ too, but I like Lisp better). — Will Ness, Feb 01 '13 at 15:43
Thank you! Yes, I understood my fault in merge function, now trying to work out your sieve code (I don't know Haskell yet). I updated the question and posted my LISP implementation of Hamming numbers (I'm not sure if updating the question was right thing to do, because I'm newbie at Stack Overflow). I also implemented that in C++11, but that code is far too ugly. — Roman Dobrovenskii, Feb 04 '13 at 08:09
Yeah, and thank you for pointing out me for my mistake in actual algorithm. (I fixed it only in LISP code yet). — Roman Dobrovenskii, Feb 04 '13 at 08:25
@Heller you're very welcome. :) The "Haskell" post I linked to has very little Haskell, mostly Pythonesque pseudocode - which thanks to you I now had a chance to flesh out in a real working Python code. The point I was trying to make is, syntax (recursion etc.), language used, etc., it's all immaterial - what really matters is the run-time structure of the computation. — Will Ness, Feb 04 '13 at 10:10
@Heller btw see also http://stackoverflow.com/a/10733621/849891 for a Python code with same principle of starting from squares of primes, applied to a "sliding" sieve of Eratosthenes. — Will Ness, Feb 04 '13 at 10:12

score 0 · Answer 2 · answered Nov 30 '20 at 02:03

I will propose this different approach - using Python heapq (min-heapq) with generator (lazy evaluation) (if you don't insist to keep the merge() function)

from heapq import heappush, heappop

def hamming_numbers(n):
    ans = [1]

    last = 0
    count = 0

    while count < n:
        x = heappop(ans)

        if x > last:
            yield x

            last = x
            count += 1

            heappush(ans, 2* x)
            heappush(ans, 3* x)
            heappush(ans, 5* x)


    >>> n  = 20
    >>> print(list(hamming_numbers(20)))
       [1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36]

Merge of lazy streams (using generators) in Python

2 Answers2

Linked