This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
The statement at the end of the second paragraph is simply not true: "the shortest number of bits necessary to transmit the message is the Shannon entropy in bits/symbol multiplied by the number of symbols in the original message." -- the formula of (bit/symbol * number of symbols) does not give the entropy when multiplied by the number of symbols in the original message! The original should be replaced with something like the "shortest possible representation".—Preceding unsigned comment added by 139.149.31.232 ( talk • contribs)
The extension to the continuous case has a subtle problem: the distribution f(x) has units of inverse length and the integral contains "log f(x)" in it. Logarithms should be taken on dimensionless quantities (quantities without units). Thus, the logarithm should be of the ratio of f(x) to some characteristic length L. Something like log [ f(x) / L ] would be more proper.
The problem with taking a transcendental function of a quantity with units arises from the way we define arithmetic operations for quantities with units. 5 m + 2 m is defined (5 m + 2 m = 7 m) but 5 m + 2 kg is not defined because the units are different among the quantities to be added. Transcendental functions (such as logarithms), of a variable x with units, present problems for determining the resulting units of the results of the functions of x. This is why scientists and engineers try to form ratios of quantities in which all the units cancel, and then apply transcendental functions to these ratios rather than the original quantities. As an example, in exp[-E/(kT)] the constant k has the proper units for canceling the units of energy E and temperature T so units cancel in the quantity E/(kT). Then the result of the operation, of a typical transcendental function on its dimensionless argument, is also dimensionless.
My suggested solution to the problem with the units raises another question: what choice of length L should be used in the expression log [ f(x) / L ]? I think any choice can work. —The preceding unsigned comment was added by 75.85.88.234 ( talk) 18:06, 17 December 2006 (UTC).
Q —The preceding unsigned comment was added by 193.254.231.71 ( talk) 10:18, 12 February 2007 (UTC).
The last definition of the differential entropy (second last formula) seems to malfunction. Actually, it should read
h[f] = lim (Delta -> 0) [ H^Delta + log Delta * Sum [ f(xi) Delta ] ]
This would ensure the complete canceling of the second sum in H^Delta. With the current formula, there would remain a non-canceling term:
h[f] = lim (Delta -> 0) [ H^Delta + log Delta ] = Integral[ f(x) log f(x) dx ] - -lim (Delta -> 0) [ log Delta * ( Sum [ f(xi) Delta ] -1 ) ] .
The last limit does not go to zero. Actually, through a l'Hopital applied to (1-Sum) / (1/log Delta) , it would go to
- lim (Delta -> 0) [ Delta (log Delta)^2 Sum[f(xi)] ],
and, as Delta -> 0, Sum[f(xi)] -> infinity as 1/Delta (since Sum[f(xi) Delta] -> 1), so it would cancel the first Delta in the limit above, and there would be only
- lim (Delta -> 0) [ (log Delta)^2 ] -> - infinity
Thus, the last definition of h[f] could not even be used. I recommend checking with a reliable source on this, then, maybe, if that formula is wrong, its erasure. Misfortunately, I have no knowledge of the way formulas are written in wikipedia (yet).
In the roulette example, the entropy of a combination of numbers hit over P spins is defined as Omega/T, but the entropy is given as lg(Omega), which then calculates to the Shannon definition. Why is lg(Omega) used? (Note: I'm using the notation "lg" to denote "log base 2") 66.151.13.191 20:41, 31 March 2006 (UTC)
Since the entropy was given as a definition, it does not need to be derived. On the other hand, a "derivation" can be given which gives a sense of the motivation for the definition as well as the link to thermodynamic entropy.
Q. Given a roulette with n pockets which are all equally likely to be landed on by the ball, what is the probability of obtaining a distribution (A1, A2, …, An) where Ai is the number of times pocket i was landed on and
is the total number of ball-landing events?
A. The probability is a multinomial distribution, viz.
where
is the number of possible combinations of outcomes (for the events) which fit the given distribution, and
is the number of all possible combinations of outcomes for the set of P events.
Q. And what is the entropy?
A. The entropy of the distribution is obtained from the logarithm of Ω:
The summations can be approximated closely by being replaced with integrals:
The integral of the logarithm is
So the entropy is
By letting px = Ax/P and doing some simple algebra we obtain:
and the term (1 − n) can be dropped since it is a constant, independent of the px distribution. The result is
Thus, the Shannon entropy is a consequence of the equation
which relates to Boltzmann's definition,
of thermodynamic entropy, where k is the Boltzmann constant.
—The preceding unsigned comment was added by MisterSheik ( talk • contribs) 17:34, 1 March 2007.
Recent edits to this page now stress the word "outcome" in the opening sentence:
and have changed formulas like
to
There appears to have been a confusion between two meanings of the word "outcome". Previously, the word was being used on these pages in a loose, informal, everyday sense to mean "the range of the random variable X" -- ie the set of values {x1, x2, x3 ...) that might be revealed for X.
But "outcome" also has a technical meaning in probability, meaning the possible states of the universe {ω1, ω2, ω3 ...), which are then mapped down onto the states {x1, x2, x3 ...) by the random variable X (considered to be a function mapping Ω -> R).
It is important the mapping X may in general be many-to-one: so H(X) and H(Ω) are not in general the same. In fact we can say definitely that H(X) <= H(Ω), with equality holding only if the mapping is one-to-one over all subsets of Ω with non-zero measure. (the "data processing theorem").
The correct equations are therefore
or
But in general the two are not the same. -- Jheald 11:37, 4 March 2007 (UTC).
Self-information of an event is a number, right? Not a random variable. Yes?
So how can entropy be the expectation of self-information? I sort-of understand what the formula is coming from, but it doesn't look theoretically sound... Thanks. 83.67.217.254 13:19, 4 March 2007 (UTC)
Ok, maybe I understand. I(omega) is a number, but I(X) is itself a random variable. I have fixed the formula. 83.67.217.254 13:27, 4 March 2007 (UTC)
Uh-oh, what have I done? "Failed to parse (Missing texvc executable; please see math/README to configure.)" Could you please fix? Thank you. 83.67.217.254 13:30, 4 March 2007 (UTC)
If I take the text of the book "Uncle Tom's Cabin", http://etext.lib.virginia.edu/etcbin/toccer-new2?id=StoCabi.sgm&images=images/modeng&data=/texts/english/modeng/parsed&tag=public&part=all , its about a megabyte of text. If I compress it using winzip I get 395K bytes. bzip2: 295KB. paq8l 235KB. This isn't normal English text, but I think you get the idea. Daniel.Cardenas 19:06, 13 May 2007 (UTC)
The article currently says "The entropy of English text is between 1.0 and 1.5 bits per letter.". Shouldn't the entropy in question decrease as one discovers more and more patterns in the language, making a text more predictable? If so, I think it would be a good idea to be a little less precise, saying "The entropy of English text can be regarded as being between 1.0 and 1.5 bits per letter." or similar instead. — Bromskloss 11:43, 7 June 2007 (UTC)
Since entropy was formally introduced by Ludwig Boltzmann the article should refer to his work:
Boltzmann, Ludwig (1896, 1898). Vorlesungen über Gastheorie : 2 Volumes - Leipzig 1895/98 UB: O 5262-6. English version: Lectures on gas theory. Translated by Stephen G. Brush (1964) Berkeley: University of California Press; (1995) New York: Dover ISBN 0-486-68455-5
—The preceding unsigned comment was added by Algorithms ( talk • contribs) 19:35, 7 June 2007.
Hmmm, this article seems to assume that logs must always be taken to base 2 - which is not the case. We can define entropy to whatever base we like (in coding it often makes things easier to define it to a base equal to the number of code symbols, which in computer science is typically 2). This leads to different units of measurements: bits vs. nats vs. hartleys.
The article should probably be modified to reflect this HyDeckar 01:16, 13 June 2007 (UTC)
Regrading the reference: Information is not entropy, information is not uncertainty ! - a discussion of the use of the terms "information" and "entropy".
They referenced article is mistaken. It refutes the claim that "information is proportional to physical randomness". However, the more random a system is the more information we need in order to describe it. I suggest we remove this reference.
—The preceding unsigned comment was added by 89.139.67.125 ( talk) 07:32, 13 June 2007
Im looking for realiable, hard references for the following phrase in the article:
"Shannon's entropy measures the information contained in a message as opposed to the portion of the message that is determined (or predictable). Examples of the latter include redundancy in language structure or statistical properties relating to the occurrence frequencies of letter or word pairs, triplets etc. See Markov chain."
Im sorry if the above concept is a bit basic and present in basic textbooks. I have not studied the subject formally, but i may have to apply the entropy concenpt in a small analysis for my master's dissertation.
I think there need to be some explanition on the matter of units for the continuous case.
f(x) will have the unit 1/x. Unless x is dimmensionless the unit of entropy will inclue the log of a unit which is weird. This is a strong reason why it is more useful for the continuous case to use the relative entropy of a distribution, where the general form is the Kullback-Leibler divergence from the distribution to a reference measure m(x). It could be pointed out that a useful special case of the relative entropy is:
which should corresponds to a rectangular distribution of m(x) between xmin and xmax. It is the entropy of a general bounded signal, and it gives the entropy in bits.
Petkr 13:38, 6 October 2007 (UTC)
not sure about the section `Limitations of entropy as information content'.
quote Consider a source that produces the string ABABABABAB... in which A is always followed by B and vice versa. If the probabilistic model considers individual letters as independent, the entropy rate of the sequence is 1 bit per character. But if the sequence is considered as "AB AB AB AB AB..." with symbols as two-character blocks, then the entropy rate is 0 bits per character. endquote
the average number of bits needed to encode this string is zero (asymptotically)
also, treating this as a markov chain (order 1), we can see from the formula in http://en.wikipedia.org/wiki/Entropy_rate and also in this article that the entropy rate is 0
also in the next paragraph quote However, if we use very large blocks, then the estimate of per-character entropy rate may become artificially low. endquote
isn't the `per-character entropy rate' redundant? should be either the `per-character entropy' or the `entropy rate' —Preceding unsigned comment added by 71.137.215.129 ( talk) 07:23, 16 January 2008 (UTC)
Since "uncertainty" (whatever that may mean) is used as a motivating factor in this article, it might be good to have a brief discussion about what is meant by "uncertainty." Should the reader simply assume the common definition of uncertainty? Or is there a specific technical meaning to this word that should be introduced? —Preceding unsigned comment added by 131.215.7.196 ( talk) 19:41, 27 January 2008 (UTC)
This section needs a major rewrite. It correctly states that Shannon entropy depends crucially on a probabilistic model. Several important points need to be made, though.
Such a bound would be extremely to obtain in the case of a single message, due to the halting problem. Deepmath ( talk) 21:28, 15 July 2008 (UTC)
The example given about the sequence ABABAB... sounds like utter nonsense to me: a source that always produces the same sequence has entropy 0, regardless of whether the sequence consists of a single symbol or not. For instance, the sequence of integers produced by counting from 0 has entropy 0, even though each symbol (integer) is different. —Preceding
unsigned comment added by
99.65.138.158 (
talk) 19:30, 21 January 2010 (UTC)
I suggest renaming this article to either "Entropy (information theory)", or preferably, "Shannon entropy". The term "Information entropy" seems to be rarely used in a serious academic context, and I believe the term is redundant and unnecessarily confusing. Information is entropy in the context of Shannon's theory, and when it is necessary to disambiguate this type of information-theoretic entropy from other concepts such as thermodynamic entropy, topological entropy, Rényi entropy, Tsallis entropy etc., "Shannon entropy" is the term almost universally used. For me, the term "information entropy" is too vague and could easily be interpreted to include such concepts as Rényi entropy and Tsallis entropy, and not just Shannon entropy (which this article exclusively discusses). Most if not all uses of the term "entropy" in some sense quantify the "information", diversity, dissipation, or "mixing up" that is present in a probability distribution, stochastic process, or the microstates of a physical system.
I would do this myself, but this article is rather frequently viewed, so I am seeking some input first. Deepmath ( talk) 01:29, 23 August 2008 (UTC)
Very many scientists like to make simple things complicated and earn the respect over this. Information entropy is a very good example of such attempt. Actually entropy is only a number of possible permutations expressed in bits divided by the length of the message. And the concept is simple as well. For the given statistical distribution of symbols we can calculate the number of possible permutations and enumerate all messages. If we do that, we can send statistics and index of the message in enumeration list instead of the message and message can be restored. But the index of the message has length as well and it can be very long so we consider the worst case scenario and take the longest index that is number of possible permutations. For example, if we have message with symbols A,B,C of 1000 symbols long with statistics 700, 200 and 100. The number of possible permutations is (1000!) / (700! * 200! * 100!). The approximate bit length of this number divided by the number of symbols is (log(1000!) – log(700!) – log(200!) – log(100!))/1000 = 1.147 bits/symbol, where all logarithms have base 2. If you calculate the entropy it is 1.157. The figures are close and they asymptotically approach each other with the growing size of the message. The limits are explained by Sterling formula, so there is no trick, just approximation. Obviously, when writing his famous article Claude Shannon did not have an idea what is going on and could not explain clear what the entropy is. He simply noticed that in compression by making binary trees similar to Huffman tree the bit length of the symbol is close to –log(p) but always larger and introduced entropy as a compression limit without clear understanding. The article was published in 1948 and Huffman algorithm did not exist but there were other similar algorithms that provided slightly different binary trees with the same concept as Huffman tree, so Shannon knew them. Surprising is not Shannon’s entropy but the other scientists who use obscure and misleading terminology for 60 years. Entropy is a measure for a number of different messages that can be possibly constructed with constrain given as frequency for every symbol that is all, simple and clear. —Preceding unsigned comment added by 63.144.61.175 ( talk) 17:47, 24 June 2008 (UTC)
Ok, so you're angry and thinking Claude Shannon sucks. Even as I type I realize this is a pointless post but seriously, expressing it unambigously in mathematical terms that are irrefutable is essential, especially in a subject area such as this. 85.224.240.204 ( talk) 02:15, 25 November 2008 (UTC)
This is the way that Kardar introduced the information entropy in his book Statistical Physics of Particles. There is also a wikibook at the external connection named An Intuitive Guide to the Concept of Entropy Arising in Various Sectors of Science, which this kind of opinion might be contributed to. Tschijnmotschau ( talk) 09:02, 3 December 2010 (UTC)
The section about the entropy of a continuous function refers to a figure, but no figure is present. —Preceding unsigned comment added by Halberdo ( talk • contribs) 17:10, 22 December 2008 (UTC)
The corresponding text apparently was added almost three years ago, and apparently the figure itself never was added as an image but only as a comment:
<!-- Figure: Discretizing the function $ f$ into bins of width $ \Delta$ \includegraphics[width=\textwidth]{function-with-bins.eps} -->
Furthermore, apparently the text was copied and pasted from PlanetMath (see here) without proper attribution of the authors as I think would be required by the GNU Free Documentation License. This talk page mentions that the article incorporates material from PlanetMath, which is licensed under the GFDL, but I am not sure that is enough? So, should the section be removed as a copyright violation? — Tobias Bergemann ( talk) 21:28, 22 December 2008 (UTC)
This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |