From coding theory --

If you want to transmit a message, say 0 or a 1, through a noisy channel, then the natural strategy is to send messages with redundancy. (A noisy channel takes your message and brings it to its destination, though with a chance of scrambling some of its letters.)

For example, suppose I wanted to transmit to you $0$ or $1$ -- $0$ for upvote and $1$ for downvote, say, though this narrative choice is mathematically irrelevant.

Suppose our channel is such that the chance of a single transmitted zero changing to a one is $.1$. If I send you the message $00000$, as long as 3 of the zeros remain zeros, you will be able decode my message by majority voting with a greater probability than if I just sent you a single $0$.

(Simlrly to hw you cn rd ths?)

The collection $\{00000,11111\}$ is what a code is, and $00000$ is called a codeword.

By continuing this way (for example by sending a million zeros to transmit the message zero), you would be able to drive the probability of a miscommunication as small as you want, however, at the cost of requiring a longer and longer transmission time to send the same message (zero or one).

If you call the rate this ratio ( the number of messages you send ) / (the number of symbols in your message), this naive coding method described lets you make the error small but at the cost of sending the rate to zero.

According to sources I've read (but how historically accurate they are I cannot say), practitioners in the field of communications (e.g. telegraph engineers) thought that this trade off was absolute - if you want to reduce errors, you must drive the rate to zero.

This is a plausible thing to conjecture after seeing the above example - or anyway you could probably get people (students) to nod along if you told showed them this example and told them this false conclusion confidently. (Conducting such an experiment would be unethical, however.)

However, in 1948 Shannon proved a remarkable theorem saying that this trade-off was *not* so absolute. (The noisy channel coding theorem.) Indeed, for any channel there is a number called the "capacity" (which in the discrete case can be computed from the probabilistic properties of the channel as an optimization problem over probability distributions on a finite set), and as long as you do not try to send at a rate better than this capacity, you can find ways to encode your messages so that the error becomes arbitrarily small. He also proved that this capacity bound was the best possible.

One catch is that you need to have a lot of messages to send; if you want to send one of two messages over your channel (and then close the channel for business), then you really cannot do better than the naive method above. Shannon's proof is asymptotic, asserting that as the size of the set of possible messages goes to infinity, you can (in principle) find encoding methods that make the error low - morally, you can keep the code words far from each other without using too many extra bits to pad the transmissions against errors, because there is a lot of extra room in higher dimensions.

His proof opened up a huge amount of research into explicit constructions of codes that approach his bound, which has connections to many other topics in mathematics and computer science.

https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

If you want to learn the precise statement, the book Cover T. M., Thomas J. A., Elements of Information Theory, is excellent.