What does Big O actually tell you? **Edit:** answering this question in the context of computational complexity, and not in general.

**It tells you that if an algorithm is $O(f(n))$, then a physical machine can in principle be built such that the algorithm runs in either exactly $f(n)$ steps, or in the case where $f(n) = n$, arbitrarily close to $n$ steps.**

**Answer to the first question**

I haven't seen anyone mention the main point here. All of the other answers are correct, that being asymptotic complexity is not a benchmarking tool. However, this doesn't answer the question of what asymptotic complexity is *really* useful for in the context of complexity theory.

You have to understand the original context in which this syntax was invented as it relates to algorithms. The idea is that this describes the performance of, not software, but *machines*. E.g. Turing machines. The Turing machine was a concept invented precisely to have an abstraction of computing machines in general.

What does this have to do with asymptotic complexity? Well, lets say you run your $O(n)$ algorithm on a Turing machine and it runs in exactly $1000n$ steps for any input string of size $n$. There is a theorem that says that a different Turing machine can be "built" such that the algorithm will take almost exactly $n$ steps to run.

In other words, although it is true that on *any particular machine*, it is possible for an $O(n)$ algorithm to in practice be slower than an $O(n^2)$ algorithm, what the theory of automata tells us is that it is always possible to build a machine such that the $O(n)$ algorithm will be in practice faster than the $O(n^2)$ algorithm, possibly at the expense of the performance of other algorithms.

So to answer your first question

Why people say that 1000 coefficient becomes insignificant when
gets really large (and that's why we can throw it away)?

These ideas were historically developed with hardware engineering in mind, rather than software. They were also developed with the question of how fast a human could execute these algorithms with pencil and paper. If an algorithm takes $1000n$ steps to run, a physical machine can (in principle) be built such that the 1000 coefficient does go away, and the algorithm actually does take as close to $n$ steps as you want. The same goes for space complexity. Alternatively, if a human were to change their process slightly, they could run the algorithm in close to $n$ steps.

So when theoreticians develop algorithms, they develop them with asymptotic complexity in mind, because this is really the only way we can measure the resource requirements of an algorithm without specifying the machine it will run on. Essentially, without picking a particular "computer architecture" to run the algorithm on, they cannot make any meaningful claims about the coefficient of the algorithm's running time.

Obviously if you are developing real software, then you know more about the machines that it will run on, and can actually make decisions based on that. But when talking about algorithms in the abstract, this is not possible.

**Answer to the second question**

To answer your second question, this *is* to some degree convention, as other's have stated. However, it is important to note that worst case complexity is much easier to reason about than the others. This is because, in the theory of automata, the idea of 'time-bounded'-ness is really the only sensible definition without first bringing some form of universal probability in the mix. And thinking about best case performance is simply not as useful in the abstract.

For reference, time boundedness is defined in the following way

If for every input word of length $n$, $M$ makes at most $T(n)$ moves before halting, then $M$ is said to be a $T(n)$ *time-bounded* Turing machine.

The theorems that I mentioned in the first part of this answer are formalized with respect to time-boundedness. That is, let $f : \mathbb{N} \to \mathbb{N}$ be a function such that $f(n)/n$ tends to infinity as $n$ goes to infinity (in essence, $f(n)$ is non-linear). If a Turing machine $M$ is $cf(n)$ time bounded, then there exists some other Turing machine $M'$ which is $f(n)$ time bounded and computes the same function as $M$.

In the special case where $f(n) = n$, you can get very close to $n$, i.e. If $M$ is $cn$ time bounded, for any real $\epsilon > 0$, there is a Turing machine $M'$ which computes the same function as $M$ and is $(1 + \epsilon)n$ time bounded. The reason that it can't be exactly $n$ is simply because it will take at least $n$ steps to read the input.

This is the formal statement of what I was saying above about a machine being built that can make the coefficient 'go away'.

It is rather hard to even state these theorems with respect to any other form of time complexity other than 'worst-case' time complexity.

(The statement and full proofs of these theorems can be found in chapter 12 of Hopcroft + Ullman. This is the source of the quoted definition above).

So the reason that we use worst case complexity is really because it is the simplest and most useful way to talk about machines without introducing some notion of probability (as in the case of average time complexity).

**Answer to the third question**

This I think has been answered rather well by the other answers. But it is important to note that nothing about an algorithm being $O(f(n))$ means that this is the *smallest* asymptotic complexity that describes it. Again, the actual running time of an algorithm is defined really by *time-boundedness* as defined above, and these time bounds are put by $O$ complexity into asymptotic equivalence classes. The running time of an algorithm, *even when you know the internals of a machine* can only in general be given by an upper bound, it cannot be exact in general, since in general, creating a computable a function which takes as input a turing machine, and some input to the number of steps that the input will take to process is undecidable. If you restrict yourself to total recursive functions, then its a bit different, but still *exact* running time bounds for an algorithm are typically not very useful.

The time bound of an algorithm is just that, a bound. It can be a very tight bound or a very loose bound, and it's usefulness can vary depending on the 'turbulence' of the running time of the algorithm. So an algorithm which is time-bounded by $f(n) = 1000n$ or even $f(n) = n^2$ may in fact have a tighter bound of $f(n) = 2n$, or it may be the case that for every even number the algorithm has a least bound (up to asymptotic equivalence) of $f(n) = 2n$ but for every odd number its least bound (up to asymptotic equivalence) is $f(n) = 2^n$. This would then make the whole algorithm need a time bound of at least $2^n$.

**Conclusion**

I repeat the main point: that if an algorithm is $O(f(n))$, then what this means is that a physical machine can in principle be built such that the algorithm runs in either exactly $f(n)$ steps, or in the case where $f(n) = n$, arbitrarily close to $n$ steps.

Asymptotic complexity is less about software engineering, and more about the philosophical implications of finding an algorithm. It may be the case that with our current technology, an $O(n)$ algorithm has a time bound of $100000 n$ on most machines, but what complexity theory tells us is that in theory, it is possible for us to develop our technology in such a way that it takes essentially $n$ steps on most machines.