Speed of different prime factor functions

Question

I have these 3 prime factor functions and I don't understand the differences in their time complexities.

This is my first function that I started with and wanted to make faster. I already had a pretty fast prime function but I figured a Sieve would be faster.

def is_prime(i):
    if i <= 1: return False
    if i <= 3: return True
    if i%3 == 0 or i%2 == 0: return False
    return sum((1 for y in xrange(5, int(i**0.5)+1, 6) if i%y == 0 or i%(y+2) == 0)) == 0

prime_factors_1 = lambda x: [i for i in range(1,x) if x%i == 0 and is_prime(i)]

This is the Sieve of Eratosthenes implementation that I found on this guys blog: http://www.drmaciver.com/2012/08/sieving-out-prime-factorizations/

def prime_factorizations(n):
   sieve = [[] for x in xrange(0, n+1)]
   for i in xrange(2, n+1):
      if not sieve[i]:
         q = i
         while q < n:
             for r in xrange(q, n+1, q):
                 sieve[r].append(i)
             q *= i
   return sieve[-1]

I like to try to improve upon examples I find, and I like to try to reduce line count while preserving functionality and time/space efficiency. I may have went overboard with the list comprehension on this one.

def prime_factors_2(n):
    factors = [[] for n in xrange(0,n+1)]
    [[[factors[r].append(i) for r in xrange(q, n+1, q)] for q in range(i,n,i)] for i in (y for y in xrange(2,n+1) if not factors[y])]   
    return factors[-1]

I timed and got this output:

prime_factorizations: 1.11333088677
prime_factors_1:      0.0737618142745
prime_factors_2:     10.7310789671

There are a few things that I don't understand about these times:

Why is the non-sieve far fastest?
- Is it because it only generates distinct prime factors?
Why is the sieve with list comprehension so much slower?
- Is (layered) list comprehension inherently slower?
What algorithm will be faster than my original non-sieve?

This question might have some of the answers you're after: http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n. You might also use `dis.dis` to investigate what's actually happening in each of your functions to determine why one or the other is heavier. — g.d.d.c, May 05 '16 at 17:03
how did you time it? for small numbers I'd expect the more direct approach to be faster. — Tadhg McDonald-Jensen, May 05 '16 at 17:03
@Tadhg McDonald-Jensen I used timeit with an input of 1234567 — Nate Schultz, May 05 '16 at 17:07
The "layered" comprehension has a `range` buried in it which will create a many many new lists. This likely accounts for a large portion of the time. It's also _really_ hard to read and I wouldn't suggest using it anyway ;-) — mgilson, May 05 '16 at 17:11
@mgilson that makes sense. yeah it was more of a "can I do all this with list comprehension" than anything else. — Nate Schultz, May 05 '16 at 17:24
@Shankar `xrange` is faster, but there is also a `range` in the comprehension too. — mgilson, May 05 '16 at 22:30
If you find yourself using `append` or other mutative operations in a list comprehension, or ignoring the return value, stop. You are building a giant extra list for no good reason and using a construct that looks non-mutative to produce side effects. You should just use a loop. — user2357112 supports Monica, May 05 '16 at 22:30

score 2 · Accepted Answer · answered May 05 '16 at 22:52

Why is the non-sieve far fastest?

The other functions do a ton of work to generate factors of numbers you don't care about.

Why is the sieve with list comprehension so much slower?

Because you screwed it up. This part:

[[[factors[r].append(i) for r in xrange(q, n+1, q)] for q in range(i,n,i)] for i in (y for y in xrange(2,n+1) if not factors[y])]
#                                                   ^^^^^^^^^^^^^^^^^^^^^

is not equivalent to the while loop in the original code, which multiplies q by i instead of adding i. Even if you had gotten it right, though, using a list comprehension for side effects would have been confusing, counter to the purpose of list comprehensions, and a waste of space for the giant nested list of Nones you build.

What algorithm will be faster than my original non-sieve?

You can divide out prime factors you've found to eliminate the need to check later factors for primality and reduce the number of factors you need to check at all:

def prime_factors(n):
    factors = []
    if n % 2 == 0:
        factors.append(2)
        while n % 2 == 0:
            n //= 2
    candidate = 3
    while candidate * candidate <= n:
        if n % candidate == 0:
            factors.append(candidate)
            while n % candidate == 0:
                n //= candidate
        candidate += 2
    if n != 1:
        factors.append(n)
    return factors

Speed of different prime factor functions

1 Answers1