104

For binary search tree type of data structures, I see the Big O notation is typically noted as O(logn). With a lowercase 'l' in log, does this imply log base e (n) as described by the natural logarithm? Sorry for the simple question but I've always had trouble distinguishing between the different implied logarithms.

David Nehme
  • 20,665
  • 7
  • 73
  • 114
BuckFilledPlatypus
  • 1,918
  • 5
  • 16
  • 23
  • 59
    As others have cogently pointed out, it doesn't matter. All logarithms differ from each other by a constant only dependent on the bases involved. Because these factors are constants, they are irrelevant for the purposes of asymptotic analysis. Second, as far determining the implied base, it depends on context. As a rough rule of thumb use the following: 1. When a mathematician writes `log n` he means the natural logarithm. 2. When a computer scientist writes `log n` he means base-two. 3. When an engineer writes `log n` he means base-ten. These are usually true. – jason Oct 15 '09 at 01:50
  • 4
    @Jason, another convention (within mathematics) is that ln n means the natural logarithm and log n is base ten. Think ln stands for the French 'logarithm naturelle'. – Internet man Oct 20 '09 at 23:24
  • 2
    The base of the logarithm is the number of children each node has. If it's a binary tree then it's a base 2 log. – Paul Dec 04 '09 at 05:50
  • 3
    I appreciate your answer, Jason, and here's something to think about. As I've researched what base the log is in (I assumed 2), I've seen the same answer: that it doesn't matter because you can eliminate the constant, log_10(2). My issue with this is that, for example: 5 log_10(5) < 5 whereas 5 log_2(5) > 5. I entered these quickly in my calculate to help conceptualize where O(n logn) has better or worse run time than O(n). Depending on the base it DOES matter. Therefore, I really think the RIGHT answer to this should be that log contextually means base 2 in most computer science applications. – Doug Mead Aug 05 '15 at 21:31
  • @jason, I'd say that it's easier to use ln (mathematician's interpretation) ;). The other two examples are reasonable. – belford Aug 23 '15 at 20:46
  • @DougMead 5 is a small value. Complexities are always defined for n > c(some constant). So, substitute a big enough value and check.. In your case, You will always see O(nlog_c n) > O(n) for all n > c. Now you can decide what is the base and this equation will always be true. :) – mk.. Jun 28 '16 at 05:59
  • Ahhhhh, I was hoping no one would notice my comment :P I figured that out shortly after, but thanks for giving me some much deserved embarrassment 11 months later, hahaha – Doug Mead Jun 28 '16 at 19:13
  • Asymptotic notation is only meant to represent upper and lower bounds, not exact number of computations. How about some more 2 years later? :D – Danilo Souza Morães Dec 15 '18 at 17:59

7 Answers7

83

Big O notation is not affected by logarithmic base, because all logarithms in different bases are related by a constant factor, O(ln n) is equivalent to O(log n).

enter image description here

K DawG
  • 11,686
  • 9
  • 28
  • 63
Cade Roux
  • 83,561
  • 38
  • 170
  • 259
  • 2
    the graphics are neat but think about the derivation of the O()-polynomial... before O() is applied, only log-base-2 is correct for binary search. – Heath Hunnicutt Oct 15 '09 at 00:57
  • 1
    @Heath Hunnicutt: No. `log_2 x` differs from `log_b x` by a constant factor `c(b)` for any base `b` independent of `x`. – jason Oct 15 '09 at 00:59
  • 1
    Jason - you misread "before O() is applied". See my answer below also. Cade is certainly correct for the O() notation. However I am talking about the pre-O()-notation derivation and my suggestion is that there is a human-understanding reason, as opposed to a mathematical reason, that log_2 x is more clear than log x once you do translate to O(). – Heath Hunnicutt Oct 15 '09 at 01:03
  • 4
    But *why* are you talking about that, when it bears no relation to the question and only serves to confuse? – hobbs Oct 15 '09 at 01:06
  • I think Heath is making a useful point, the origin of the growth relative to n is to do with log2. –  Oct 15 '09 at 01:08
  • 4
    hobbs: Because that fact is the reason the OP was inspired to inquire. I'm trying to connect his ideas with the answer, so he understands why he had his intuition, why it does not apply to O(), but not to over-apply what he learns here to the derivation part of the analysis. The terse answers which don't address the root cause of the misunderstanding may lead to further misunderstanding. It's bad pedagogy. – Heath Hunnicutt Oct 15 '09 at 01:16
  • 1
    Jason: no the point is to help the OP understand. And also, on every line of the derivation leading up to the O(), it does matter. Only once you re-express in O() does it not matter. Many other readers get that point. And you do understand I know that O(log N) = O(ln N), right? – Heath Hunnicutt Oct 15 '09 at 01:36
  • 4
    @Heath Hunnicutt: If you're doing asymptotic analysis, it doesn't matter. That you wait until the last minute to throw some big-O's in doesn't change the fact that I can multiply and divide all my logarithms by some silly constant and change the base at all steps. That is, if I have some analysis that involves `log_2 n`, I can just go in and replace `log_2 n` everywhere by `log_pi 2 * log_2 n / log_pi 2` and then just end up with an analysis that has `log_pi 2 * log_pi n` everywhere. Now my analysis is in terms of `log_pi n`. – jason Oct 15 '09 at 01:49
  • @jason Obviously I know that. What you are missing is the relevance of the logarithm base in deriving the O() expression. You aren't paying attention to most of the steps, only the final step. The distinction you consider irrelevant is actually quite relevant to understanding the derivation. In this way, the dogma around O() notation and stock interview answers will hold you back. Not understanding the relevance of the logarithm base during the derivation is a shortcoming. – Heath Hunnicutt Jun 23 '15 at 19:26
81

Once expressed in big-O() notation, both are correct. However, during the derivation of the O() polynomial, in the case of binary search, only log2 is correct. I assume this distinction was the intuitive inspiration for your question to begin with.

Also, as a matter of my opinion, writing O(log2 N) is better for your example, because it better communicates the derivation of the algorithm's run-time.

In big-O() notation, constant factors are removed. Converting from one logarithm base to another involves multiplying by a constant factor.

So O(log N) is equivalent to O(log2 N) due to a constant factor.

However, if you can easily typeset log2 N in your answer, doing so is more pedagogical. In the case of binary tree searching, you are correct that log2 N is introduced during the derivation of the big-O() runtime.

Before expressing the result as big-O() notation, the difference is very important. When deriving the polynomial to be communicated via big-O notation, it would be incorrect for this example to use a logarithm other than log2 N, prior to applying the O()-notation. As soon as the polynomial is used to communicate a worst-case runtime via big-O() notation, it doesn't matter what logarithm is used.

Heath Hunnicutt
  • 17,073
  • 2
  • 36
  • 60
  • 4
    But it's very easy to show that `log_2 n` is in `Θ(log_a n)` for any base `a`, so I'm not sure I see how using base 2 is "more correct". – bcat Oct 15 '09 at 00:34
  • Sorry, what does "easily typeset" mean? –  Oct 15 '09 at 00:35
  • In the first sentence of your answer. – bcat Oct 15 '09 at 00:40
  • Kinopiko -- For example, if you were writing it down by hand. However,I think O(N log N) is a lot more readable than O(N log-base-2 N), but if I were using TeX I would definitely write $log_2$ even inside the O() just to make the point that the O() derived from a polynomial that had $log_2$ and not $log$. See what I mean? – Heath Hunnicutt Oct 15 '09 at 00:41
  • I see bcat. Well you can read the rest of my answer and see what I meant, I hope. You can also see in the same sentence I did write "both are correct." I will change it to say "better" because that's more incisive. – Heath Hunnicutt Oct 15 '09 at 00:42
  • I see what you mean, although the phraseology is a little odd. –  Oct 15 '09 at 00:44
  • bcat -- imagine you were deriving the polynomial for the runtime of binary search, BEFORE you wrote it in the O() or Theta() notations... Only log-base-2 would be correct. Once you express it as O(), you can re-express as any base, I agree. But since the polynomial used for the derivation would be wrong with any other logarithm... We forget that before you express O(), you derive a polynomial expression of the runtime. For things like binary search, we did that polynomial so long ago we have forgotten it and remember only the O() answer. – Heath Hunnicutt Oct 15 '09 at 00:47
  • Oh, I see what you're saying. That makes sense. – bcat Oct 15 '09 at 00:51
  • 1
    Kinopkio and bcat, thanks for helping it become useful. It was not very well-written at first. :) – Heath Hunnicutt Oct 15 '09 at 01:04
  • It is *not* true that during the derivation, log_2 is correct. It *may* be correct (e.g., binary search), but there's no a priori reason to believe that to be the case. There are plenty of problems where log_3 or even log_e show up naturally. – Jesse Beder Oct 15 '09 at 01:07
  • But this question is about binary search, so actually it is true. –  Oct 15 '09 at 01:09
  • By the way, I just wanted to clarify that my above response is because the title of the question is "Is Big O(logn) log base e?", and the answers seem to be approaching this question generally, not specifically with respect to binary search. I'm afraid that your explanation may confuse people. If you make it very clear that you're *only* talking about binary search, then it's fine. – Jesse Beder Oct 15 '09 at 01:10
  • 2
    Well I added clarity but I sure am hurt that you think my answer might confuse people. Actually, most of the answers here didn't consider the OP's intuition and try to teach him much. I'm not so much wowed by the competition, I'm kind of sad at the low bar for pedagogy. – Heath Hunnicutt Oct 15 '09 at 01:14
  • A lot of good answers here, it's very difficult to choose the final answer. The fact that the base does not matter in Big-O notation answered my ambiguity question (everyone's answers). This answer also provided the pre O-notation derivation for the binary search tree which was the original intent of my perhaps misguided question. Thanks for the help. – BuckFilledPlatypus Oct 15 '09 at 01:55
  • Thank you BuckFilledPlatypus, your comment and selection are sincerely appreciated. – Heath Hunnicutt Oct 15 '09 at 02:02
  • 11
    "during the derivation of the O() polynomial, in the case of binary search, only log2 is correct." -1 for poor mathematics. The definition of x(n) ~ O(f(n)) says that there exists a constant c such that c*(f(n)) < x(n) for all n > n_0. Thus the constant coefficient is completely irrelevant during the analysis. – rlbond Oct 15 '09 at 02:11
  • rlbond: that was petty. You can read many other comments here and realize that I understand what you are getting at and do not agree with your pedagogy. To -1 after the OP has validated the answer, which does not disagree with you mathematically, is surely an emotional matter. – Heath Hunnicutt Oct 15 '09 at 02:21
  • @Kinopiko - "easily typeset" means it's easier to write log() if you can't easily type a subscript in the editor (like in this one) – Martin Beckett Oct 15 '09 at 02:21
  • These comments are getting too long. –  Oct 15 '09 at 02:37
  • 1
    I think your answer is misleading. I gave it a -1 because I felt it deserved -1. Don't take it personally if someone disagrees with your answer, it's still on the top, and it's just a community wiki anyway! And, questions aren't done once the OP accepts the answer. It's still the responsibility of the community to judge the fitness of each answer so other people can access the information. – rlbond Oct 15 '09 at 03:39
  • The worst case runtime for a search of a (completely unbalanced) binary tree is O(n), the worst case runtime for a search of a balanced binary tree is O(log n). – Cade Roux Oct 15 '09 at 04:23
  • @rlbond -- While your philosophy may be correct, your remark about my math is wrong. The log is base 2 *during the derivation* and it doesn't matter only afterward. My statement is not mathematically incorrect, rather your suggesting that it is so. – Heath Hunnicutt Feb 13 '10 at 22:39
  • 4
    Since log2(x) is equal to log10(x)/log10(2), you can derive it either way. The log is not strictly base 2 at any point. – rlbond Feb 15 '10 at 04:00
  • 1
    You realize by during the derivation, I mean derivation of the runtime polynomial, prior to application of O()-notation? When deriving the function to which we apply the O()-notation, if the tree is binary, the logarithm is also. – Heath Hunnicutt Feb 15 '10 at 05:21
9

It doesn't really matter what base it is, since big-O notation is usually written showing only the asymptotically highest order of n, so constant coefficients will drop away. Since a different logarithm base is equivalent to a constant coefficient, it is superfluous.

That said, I would probably assume log base 2.

Daniel Pryden
  • 54,536
  • 12
  • 88
  • 131
  • @Kinopiko: What exactly is wrong about it? More precisely, how is my answer factually different from yours and others here? – Daniel Pryden Oct 15 '09 at 00:37
  • Ah, perhaps my mistake in the use of "coefficient". I will edit to clarify. – Daniel Pryden Oct 15 '09 at 00:38
  • That was my main issue with your answer. Also, it's a bit unclear what you mean by "they will still have some effect". Some effect on what? – bcat Oct 15 '09 at 00:39
  • 1
    Your answer discusses the highest order coefficients. What you said is correct as far as it goes, but that is not the reason that the logarithm base is irrelevant. The reason is that the difference between different base logarithms is a constant which is absorbed by the O(). –  Oct 15 '09 at 00:40
  • It still isn't really correct, there isn't any point in talking about highest order of n in this case. Cade Roux's answer is the correct one. –  Oct 15 '09 at 00:42
  • @Kinopiko: ... and the constant is "absorbed" by the O() *because* we only care about the highest order coefficient of `n`. Right? – Daniel Pryden Oct 15 '09 at 00:43
  • Maybe I'm misunderstanding something. Isn't the definition of O(f(n)) the function f() that represents the asymptotic complexity of the algorithm, as a function of `n`? Asymptotic complexity is dominated by the highest-order component of the expression -- hence lower-order (or constant) expressions are superfluous. Or am I misunderstanding something fundamental? – Daniel Pryden Oct 15 '09 at 00:45
  • @Daniel: no the constant C is absorbed because, for example, O(100) = O(1). –  Oct 15 '09 at 00:48
  • 1
    @Kinopiko: OK. I think we are saying the same thing. I would say O(100) = O(1) because O(100) = O(100 * 1) = O(C * 1) = O(1). Which is what I meant by constant expressions being superfluous. That is, the *order* of *any* constant is 1. – Daniel Pryden Oct 15 '09 at 01:00
8

Both are correct. Think about this

log2(n)=log(n)/log(2)=O(log(n))
log10(n)=log(n)/log(10)=O(log(n))
logE(n)=log(n)/log(E)=O(log(n))
cartonn
  • 5,314
  • 3
  • 16
  • 17
3

Yes, when talking about big-O notation, the base does not matter. However, computationally when faced with a real search problem it does matter.

When developing an intuition about tree structures, it's helpful to understand that a binary search tree can be searched in O(n log n) time because that is the height of the tree - that is, in a binary tree with n nodes, the tree depth is O(n log n) (base 2). If each node has three children, the tree can still be searched in O(n log n) time, but with a base 3 logarithm. Computationally, the number of children each node has can have a big impact on performance (see for example: link text)

Enjoy!

Paul

Paul
  • 5,308
  • 1
  • 17
  • 18
1

Technically the base doesn't matter, but you can generally think of it as base-2.

Tim Sylvester
  • 21,850
  • 2
  • 69
  • 92
1

First you must understand what it means for a function f(n) to be O( g(n) ).

The formal definition is: *A function f(n) is said to be O(g(n)) iff |f(n)| <= C * |g(n)| whenever n > k, where C and k are constants.*

so let f(n) = log base a of n, where a > 1 and g(n) = log base b of n, where b > 1

NOTE: This means the values a and b could be any value greater than 1, for example a=100 and b = 3

Now we get the following: log base a of n is said to be O(log base b of n) iff |log base a of n| <= C * |log base b of n| whenever n > k

Choose k=0, and C= log base a of b.

Now our equation looks like the following: |log base a of n| <= log base a of b * |log base b of n| whenever n > 0

Notice the right hand side, we can manipulate the equation: = log base a of b * |log base b of n| = |log base b of n| * log base a of b = |log base a of b^(log base b of n)| = |log base a of n|

Now our equation looks like the following: |log base a of n| <= |log base a of n| whenever n > 0

The equation is always true no matter what the values n,b, or a are, other than their restrictions a,b>1 and n>0. So log base a of n is O(log base b of n) and since a,b doesn't matter we can simply omit them.

You can see a YouTube video on it here: https://www.youtube.com/watch?v=MY-VCrQCaVw

You can read an article on it here: https://medium.com/@randerson112358/omitting-bases-in-logs-in-big-o-a619a46740ca

tempmail
  • 126
  • 1
  • 4