Question
I have to create a priority queue storing distances. To build the heap I am thinking about the following two possibilities:
from heapq import heapify, heappush
n = 35000 # input size
# way A: using heapify
dist = []
for i in range(n):
dist.push(distance) # distance is computed in O(1) time
heapify(dist)
# way B: using heappush
dist = []
for i in range(n):
heappush(dist, distance) # distance is computed in O(1) time
Which one is faster?
Reasoning
According to the docs heapify()
runs in linear time, and I'm guessing heappush()
runs in O(log n) time. Therefore, the running time for each way would be:
- A: O(2n) = O(n)
- B: O(n log n)
However, it is counter intuitive for me that A is faster than B. Am I missing something? is it A really faster than B?
**EDIT
I've been testing with different inputs and different sizes of the array, and I am still not sure which one is faster.
After reading the link of the comment by Elisha, I understand how heapify()
runs in linear time. However, I still don't know if using heappush()
could be faster depending on the input.
I mean, heappush()
has a worst case running time of O(log n), but in average will probably be smaller, depending on the input. Its best case running time is actually O(1). In the other hand heapify()
has a best case running time of O(n), and must be called after filling the array, which takes also O(n). That makes a best case of O(2n).
So heappush()
could be as fast as linear or as slow as O(n log n), whereas heapify()
is going to take 2n
time in any case. If we look at the worst case, heapify()
will be better. But what about an average case?
Can we even be sure that one be faster than te other?