Questions tagged [information-theory]

The science of compressing and communicating information. It is a branch of applied mathematics and electrical engineering. Though originally the focus was on digital communications and computing, it now finds wide use in biology, physics and other sciences.

Information theory studies the quantification, storage, and communication of information. Applications of fundamental topics of information theory include lossless data compression (e.g., ZIP files), lossy data compression (e.g., MP3s and JPEGs), and channel coding (e.g., for digital subscriber line (DSL)). Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields.

A key measure in information theory is entropy, which quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a die (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and Kullback-Leibler divergence.

The field is at the intersection of mathematics, statistics, computer science, physics, neurobiology, and electrical engineering. The theory has also found applications in other areas, including , natural language processing, , neurobiology, human vision, the evolution and function of molecular codes (bioinformatics), model selection in statistics, thermal physics, quantum computing, linguistics, plagiarism detection, , and anomaly detection. Important sub-fields of information theory include source coding, channel coding, algorithmic complexity theory, algorithmic information theory, information-theoretic security, and measures of information.

2121 questions
150
votes
15 answers

Intuitive explanation of entropy

I have bumped many times into entropy, but it has never been clear for me why we use this formula: If $X$ is random variable then its entropy is: $$H(X) = -\displaystyle\sum_{x} p(x)\log p(x).$$ Why are we using this formula? Where did this formula…
jjepsuomi
  • 8,135
  • 12
  • 49
  • 89
87
votes
2 answers

Determining information in minimum trials (combinatorics problem)

A student has to pass a exam, with $k2^{k-1}$ questions to be answered by yes or no, on a subject he knows nothing about. The student is allowed to pass mock exams who have the same questions as the real exam. After each mock exam the teacher tells…
54
votes
1 answer

The Complexity of "The Baby Shark Song".

This question is just for fun. I hope it's received in the same goofy spirit in which I wrote it. I just had the pleasure of reading Knuth's "The Complexity of Songs" and I thought it'd be hilarious if someone could do an analysis of the complexity…
Shaun
  • 38,253
  • 17
  • 58
  • 156
43
votes
4 answers

Paradox: Roots of a polynomial require less information to express than coefficients?

A somewhat information theoretical paradox occurred to me, and I was wondering if anyone could resolve it. Let $p(x) = x^n + c_{n-1} x^{n-1} + \cdots + c_0 = (x - r_0) \cdots (x - r_{n-1})$ be a degree $n$ polynomial with leading coefficient $1$.…
chausies
  • 1,622
  • 8
  • 17
39
votes
2 answers

How is logistic loss and cross-entropy related?

I found that Kullback-Leibler loss, log-loss or cross-entropy is the same loss function. Is the logistic-loss function used in logistic regression equivalent to the cross-entropy function? If yes, can anybody explain how they are related? Thanks
33
votes
4 answers

Shannon entropy of a fair dice

The formula for Shannon entropy is as follows, $$\text{Entropy}(S) = - \sum_i p_i \log_2 p_i $$ Thus, a fair six sided dice should have the entropy, $$- \sum_{i=1}^6 \dfrac{1}{6} \log_2 \dfrac{1}{6} = \log_2 (6) = 2.5849...$$ However, the entropy…
30
votes
2 answers

Information-theoretic aspects of mathematical systems?

It occured to me that when you perform division in some algebraic system, such as $\frac a b = c$ in $\mathbb R$, the division itself represents a relation of sorts between $a$ and $b$, and once you calculate this relation, the resulting element…
30
votes
7 answers

Is there a way to find the log of very large numbers?

I should like to evaluate $\log_2{256!}$ or other large numbers to find 'bits' of information. For example, I'd need three bits of information to represent the seven days of the week since $\lceil \log_2{7}\rceil = 3$, but my calculator returns an…
28
votes
3 answers

Height of all skyscraper

There is a bunch of skyscapers, each have a height, which is a positive integer. You are given at the start the total sum of their height. Now everyday you can make one measurement, which will tell you how many skyscapers there are which have height…
nescio
  • 281
  • 2
  • 3
26
votes
3 answers

An information theory inequality which relates to Shannon Entropy

For $a_1,...,a_n,b_1,...,b_n>0,\quad$ define $a:=\sum a_i,\ b:=\sum b_i,\ s:=\sum \sqrt{a_ib_i}$. Is the following inequality true?: $${\frac{\Bigl(\prod a_i^{a_i}\Bigr)^\frac1a}a \cdot \frac{\left(\prod b_i^{b_i}\right)^\frac1b}b…
Amir Parvardi
  • 4,788
  • 2
  • 23
  • 61
24
votes
3 answers

Is Standard Deviation the same as Entropy?

We know that standard deviation (SD) represents the level of dispersion of a distribution. Thus a distribution with only one value (e.g., 1,1,1,1) has SD equals to zero. Similarly, such a distribution requires little information to be defined. On…
22
votes
1 answer

Why is "h" used for entropy?

Why is the letter "h" (or "H") used to denote entropy in information theory, ergodic theory, and physics (and possibly other places)? Edit: I'm looking for an explanation of the original use of "H". As Ilmari Karonen points out, Shannon got "H" from…
Quinn Culver
  • 4,217
  • 1
  • 25
  • 45
21
votes
4 answers

What exactly is a probability measure in simple words?

Can someone explain probability measure in simple words? This term has been hunting me for my life. Today I came across Kullback-Leibler divergence. The KL divergence between probability measure P and Q is defined by, $$KL(P,Q)= \begin{cases} …
user13985
  • 1,115
  • 4
  • 15
  • 24
21
votes
2 answers

Can the entropy of a random variable with countably many outcomes be infinite?

Consider a random variable $X$ taking values over $\mathbb{N}$. Let $\mathbb{P}(X = i) = p_i$ for $i \in \mathbb{N}$. The entropy of $X$ is defined by $$H(X) = \sum_i -p_i \log p_i.$$ Is it possible for $H(X)$ to be infinite?
VSJ
  • 1,031
  • 7
  • 17
21
votes
4 answers

Is log the only choice for measuring information?

When we quantify information, we use $I(x)=-\log{P(x)}$, where $P(x)$ is the probability of some event $x$. The explanation I always got, and was satisfied with up until now, is that for two independent events, to find the probability of them both…
Cordello
  • 690
  • 3
  • 16
1
2 3
99 100