![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
![]() | This article may be too technical for most readers to understand.(September 2010) |
The first line in 'Example' miss a left-hand parenthesis ")". Thank you for a nice article!
The above comment is specious. The writer brings up a point that Fisher Information does not speak to. Fisher information assumes that one is estimating a parameter and that there is no a priori distribution of that parameter. This is one of the weaknesses of Fisher Information. However, it is not relevant to an article about Fisher information except in the context of "Other formulations." There is, however an important error in this article. The second derivative version of the definition of Fisher Information is only valid if the proper regularity condition is met. I added the condition, though this may not be the best representation of it. The formula looks rather ugly to me, but I don't have time to make it pretty. Sorry! -- 67.85.203.239 22:15, 12 February 2006 (UTC)
In the expression
might it be ?
Also, it is unclear whether the 's must cover the whole parameter space, or could cover some subspace. In discussing the N-variate gaussian, it is said that the information matrix has indeces running from 1 to , but there are parameters to describe a gaussian. This is probably a mistake. PhysPhD
I should admit that I have studied mathematical statistics. Even so, by Wiki standards, this entry is not unduly technical. I've added some links (and am sure more could be added) that should help the novice reader along. The first person to contribute to this talk page is an unwitting Bayesian, when (s)he calls for a "prior distribution" on θ. Information measures and entropy are bridges connecting classical and Bayesian statistics. This entry should sketch bits of those bridges, if only by including a few links. This entry should say more comparing and constrasting Fisher information with the measures of Shannon, Kullback-Leibler, and possibly others.
Wiki should also say more, somewhere, about the extraordinary work of Roy Frieden. Frieden, a respectable physicist, has written a nearly 500pp book arguing that a great deal of theoretical physics can be grounded in Fisher information and the calculus of variations. This should not come as complete surprise to anyone who has mastered Hamiltonian mechanics and has thought about the principle of least action, but even so, Frieden's book is a breathtaking high wire act. It appears that classical mechanics, electromagnetism, and thermodynamics, general relativity, and quantum electrodynamics are all merely different applications of a few core information-theoretic and variational principles. Frieden (2004) also includes a chapter on what he thinks his EPI approach could contribute to unsolved problems, such as quantum gravitation, turbulence, and topics in particle physics. Could EPI even prove to be the eventual gateway to that Holy Grail of contemporary science, the unification of the three fundamental forces, electroweak, strong, and gravitation? I should grant that EPI doesn't answer everything; for example, it sheds no light on why the fundamental dimensionless constants take on the values that they do. Curiously, Frieden says little about optics even though that was his professional specialty. 202.36.179.65 13:19, 11 April 2006 (UTC)
B. Roy Frieden claims to have developed a "universal method" in physics, based upon Fisher information. He has written a book about this. Unfortunately, while Frieden's ideas initially appear interesting, his claimed method has been characterized as highly dubious by knowledgeable observers (Google for a long discussion in sci.physics.research from some years ago.)
Note that Frieden is Prof. Em. of Optical Sciences at the University of Arizona. The data.optics.arizona.edu anon has used the following IPs to make a number of questionable edits:
These POV-pushing edits should be modified to more accurately describe the status of Frieden's work.--- CH 21:54, 16 June 2006 (UTC)
In addressing the technical accessibility tag above, I would recommend the addition of some graphs. For example, this concept could be related to the widely understood concept of the Gaussian bell curve. -- Beland 21:35, 4 November 2006 (UTC)
In the one-dimensional equation, there is a minus sign in the equation linking the second derivative of the log likelihood to the variance of theta. This stands to reason, as we want maximum, not minimum likelihood, so the second derivative becomes negative. In the matrix formulation below, there is no minus sign. Should it not be there, too? In practice, of course, one often minimizes sums of squares, or other "loss" functions, instead. This already is akin to -log(L). I am not a professional statistician, but I use statistics a lot in my profession, microbiology. I did not find the article too technical. After all, the subject itself is somewhat technical. Wikipedia does a great job of making gems such as this accessible. 82.73.149.14 19:51, 30 December 2006 (UTC)Bart Meijer
I think that the style in which parts of this article are written is more appropriate for a textbook than for an encyclopedia article. For example: "To informally derive the Fisher Information, we follow the approach described by Van Trees (1968) and Frieden (2004)" This type of comment is only really appropriate in a textbook where a single author or a few authors are writing a book with a coherent theme. An encyclopedia article ought to adopt a different style: in particular, I object to the use of the term "we", as on wikipedia, with so many authors and with anonymous authors, it is not clear who the word "we" refers to. Instead, I think we should word things "Van Trees (1968) and Frieden (2004) provide the following method of deriving the Fisher information informally:". I am going to rewrite this to try to eliminate these sorts of comments. But...I think this style problem goes beyond just the use of the word "we"...it's pretty pervasive and it needs deep changes. Cazort ( talk) 18:14, 10 January 2008 (UTC)
This derivation doesn't seem to be a derivation of the Fisher information, but rather, a derivation of the relationship between Fisher information and the bound on the variance of an estimator. Does everyone agree with me that this should be renamed? Also, this remark relates to the definition of Fisher information. For example, the comment "The Fisher information is the amount of information" is loaded, because it is not defined what information means. I am going to weaken this statement accordingly. If we can come up with a more rigorous and more precise definition then we should include it! Cazort ( talk) 18:22, 10 January 2008 (UTC)
I've heard mention of "mutual information" and "joint information" (bivariate discrete random variables); shouldn't these terms be discussed? 199.196.144.13 ( talk) 21:08, 29 May 2008 (UTC)
I suggest that the article Observed information be merged with the current, since it repeats the definition of the Fisher information, only substituting the expected value w.r.t. sample probability distribution instead of the expected value with respect to the population. As such, the observed information is simply the sample Fisher information. … stpasha » 07:20, 24 January 2010 (UTC)
Merge tag removed, as no support or action for 2 years. Melcombe ( talk) 00:22, 8 February 2012 (UTC)
Thanks for correcting my edits to the Fisher information page, and sorry for saying something that wasn't quite correct (and also for getting the sign wrong!). The claim that the Fisher information is the Hessian of the entropy was in the article before I edited it, so it's good that it's gone now.
Correct me if I'm wrong, but it seems the Fisher information is always equal to the negative Hessian of the entropy for discrete probability distributions. I'd worked it out for discrete distributions and naively assumed it was true in general, but this looks like one of the many quirks of the definition of the continuous entropy as
(OT rant: IMO the continuous entropy should never have been defined that way, since it's not equal to the continuous limit of the discrete entropy, which actually diverges to infinity, and lacks many of the desirable properties of the discrete version. If you put in a scaling factor to prevent divergence, and are careful to make it invariant to coordinate changes, you always end up with a relative entropy instead of H as defined above.)
Anyway, if it is true that the Fisher information is equal to the negative Hessian of the entropy for discrete distributions I'd like to put the formula at some early point in the article (along with a caveat about continuous distributions), since it would help someone with my background get a handle on the Fisher information a bit more easily.
Nathaniel Virgo ( talk) 14:19, 7 October 2010 (UTC)
Hi All,
Firstly does the ; simbol mean the same as | (given) and secondly Im assuming f(x|θ) is a pdf for a continuous variable?
Thanks, Sachin Sachinabey ( talk) 08:12, 9 May 2011 (UTC)
Nowhere in the article it says that the Fischer Information Matrix is the inverse of the Covariance matrix in the multivariate normal case. Yet this information is used in many sources especially in the context of Bayesian Networks (e.g. see http://en.wikipedia.org/wiki/Kalman_filter#Information_filter) — Preceding unsigned comment added by 89.204.138.242 ( talk) 12:34, 23 January 2013 (UTC)
This article is virtually useless to any reader who is not already familiar with the field. I came here simply to find out what a 'Fisher matrix' is and there is nothing here which clearly answers that question. There's a sentence stating the general idea but it then dives directly into the full derivation with no simple example. The page appears to be trying to be a postgrad textbook rather than an article for a reader who has come across a term and would like to know what it means. Sadly I'm nowhere near able to do so myself, but I would suggest this article needs: 1) A simple example of what a Fisher matrix is. 2) A beginner-friendly description of what its components actually mean. The priority for any page should be to give someone who has never heard of the subject before a general idea of what the subject is, this article seems to fail on that. — Preceding unsigned comment added by 86.153.104.154 ( talk) 09:47, 4 April 2014 (UTC)
The introductory paragraph doesn't make any sense at all. It says:
In mathematical statistics, the Fisher information (sometimes simply called information[1]) is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ upon which the probability of X depends.
But X doesn't depend on $\theta$ at all. $X$ is external data. $\theta$ is a model parameter with which we are modelling $X$.
MisterSheik ( talk) 07:45, 4 September 2014 (UTC)
I've a similar question to Sachin. I'm not following this article when it uses | and ; in different contexts. In the computation of the first moment, are we dealing with conditional expectation or just expectation? If conditional then the switch to integration does not make sense as we should use the conditional density. The derivation only makes sense to me if | is replaced with ;. Can someone elucidate me on this? Smk65536 ( talk) 14:55, 23 October 2015 (UTC)
In the section "Single-parameter Bernoulli experiment," could you explain why the variance of the mean of successes in "n Bernoulli trials" is ? This is what you imply when you say that the variance in question is the inverse of the additive Fisher information. Everywhere I looked, the variance of the mean of successes in "n Bernoulli trials," with probability of success , is .
Also, why did you drop the word "independent" in the last sentence of that section? — Preceding
unsigned comment added by
174.192.30.141 (
talk)
04:21, 3 May 2018 (UTC)
I added a subsection that clarifies a commonly seen discrepancy in the definition of Fisher information. That is, some textbooks and notes define Fisher information with respect to one observation while some others define it using likelihood for all observations.
A critical problem is the lack of clarification which version is used in each scenario. I hope someone can help with adding short phrases after some frequently used important results that clarifies which version of Fisher information definition is used. For example, Cramer-Rao lower bound (writing only $I(\theta)$ rather than $nI(\theta)$ on the denominator may cause a misunderstanding that this lower bound doesn't depend on $n$, and it will be much better if it's clarified immediately that this $I(\theta)$ is defined using the joint log-likelihood so it is linear in $n$) -- this might seem a bit repetitive but it's really not (it may save a lot of time for beginners from confusion, especially when they compare the $I(\theta)$ defined in C-R lower bound to the $I(\theta)$ that appears in the asymptotic normal variance of MLE, where the tradition is almost unanimously defining $I(\theta)$ for only one observation).
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
![]() | This article may be too technical for most readers to understand.(September 2010) |
The first line in 'Example' miss a left-hand parenthesis ")". Thank you for a nice article!
The above comment is specious. The writer brings up a point that Fisher Information does not speak to. Fisher information assumes that one is estimating a parameter and that there is no a priori distribution of that parameter. This is one of the weaknesses of Fisher Information. However, it is not relevant to an article about Fisher information except in the context of "Other formulations." There is, however an important error in this article. The second derivative version of the definition of Fisher Information is only valid if the proper regularity condition is met. I added the condition, though this may not be the best representation of it. The formula looks rather ugly to me, but I don't have time to make it pretty. Sorry! -- 67.85.203.239 22:15, 12 February 2006 (UTC)
In the expression
might it be ?
Also, it is unclear whether the 's must cover the whole parameter space, or could cover some subspace. In discussing the N-variate gaussian, it is said that the information matrix has indeces running from 1 to , but there are parameters to describe a gaussian. This is probably a mistake. PhysPhD
I should admit that I have studied mathematical statistics. Even so, by Wiki standards, this entry is not unduly technical. I've added some links (and am sure more could be added) that should help the novice reader along. The first person to contribute to this talk page is an unwitting Bayesian, when (s)he calls for a "prior distribution" on θ. Information measures and entropy are bridges connecting classical and Bayesian statistics. This entry should sketch bits of those bridges, if only by including a few links. This entry should say more comparing and constrasting Fisher information with the measures of Shannon, Kullback-Leibler, and possibly others.
Wiki should also say more, somewhere, about the extraordinary work of Roy Frieden. Frieden, a respectable physicist, has written a nearly 500pp book arguing that a great deal of theoretical physics can be grounded in Fisher information and the calculus of variations. This should not come as complete surprise to anyone who has mastered Hamiltonian mechanics and has thought about the principle of least action, but even so, Frieden's book is a breathtaking high wire act. It appears that classical mechanics, electromagnetism, and thermodynamics, general relativity, and quantum electrodynamics are all merely different applications of a few core information-theoretic and variational principles. Frieden (2004) also includes a chapter on what he thinks his EPI approach could contribute to unsolved problems, such as quantum gravitation, turbulence, and topics in particle physics. Could EPI even prove to be the eventual gateway to that Holy Grail of contemporary science, the unification of the three fundamental forces, electroweak, strong, and gravitation? I should grant that EPI doesn't answer everything; for example, it sheds no light on why the fundamental dimensionless constants take on the values that they do. Curiously, Frieden says little about optics even though that was his professional specialty. 202.36.179.65 13:19, 11 April 2006 (UTC)
B. Roy Frieden claims to have developed a "universal method" in physics, based upon Fisher information. He has written a book about this. Unfortunately, while Frieden's ideas initially appear interesting, his claimed method has been characterized as highly dubious by knowledgeable observers (Google for a long discussion in sci.physics.research from some years ago.)
Note that Frieden is Prof. Em. of Optical Sciences at the University of Arizona. The data.optics.arizona.edu anon has used the following IPs to make a number of questionable edits:
These POV-pushing edits should be modified to more accurately describe the status of Frieden's work.--- CH 21:54, 16 June 2006 (UTC)
In addressing the technical accessibility tag above, I would recommend the addition of some graphs. For example, this concept could be related to the widely understood concept of the Gaussian bell curve. -- Beland 21:35, 4 November 2006 (UTC)
In the one-dimensional equation, there is a minus sign in the equation linking the second derivative of the log likelihood to the variance of theta. This stands to reason, as we want maximum, not minimum likelihood, so the second derivative becomes negative. In the matrix formulation below, there is no minus sign. Should it not be there, too? In practice, of course, one often minimizes sums of squares, or other "loss" functions, instead. This already is akin to -log(L). I am not a professional statistician, but I use statistics a lot in my profession, microbiology. I did not find the article too technical. After all, the subject itself is somewhat technical. Wikipedia does a great job of making gems such as this accessible. 82.73.149.14 19:51, 30 December 2006 (UTC)Bart Meijer
I think that the style in which parts of this article are written is more appropriate for a textbook than for an encyclopedia article. For example: "To informally derive the Fisher Information, we follow the approach described by Van Trees (1968) and Frieden (2004)" This type of comment is only really appropriate in a textbook where a single author or a few authors are writing a book with a coherent theme. An encyclopedia article ought to adopt a different style: in particular, I object to the use of the term "we", as on wikipedia, with so many authors and with anonymous authors, it is not clear who the word "we" refers to. Instead, I think we should word things "Van Trees (1968) and Frieden (2004) provide the following method of deriving the Fisher information informally:". I am going to rewrite this to try to eliminate these sorts of comments. But...I think this style problem goes beyond just the use of the word "we"...it's pretty pervasive and it needs deep changes. Cazort ( talk) 18:14, 10 January 2008 (UTC)
This derivation doesn't seem to be a derivation of the Fisher information, but rather, a derivation of the relationship between Fisher information and the bound on the variance of an estimator. Does everyone agree with me that this should be renamed? Also, this remark relates to the definition of Fisher information. For example, the comment "The Fisher information is the amount of information" is loaded, because it is not defined what information means. I am going to weaken this statement accordingly. If we can come up with a more rigorous and more precise definition then we should include it! Cazort ( talk) 18:22, 10 January 2008 (UTC)
I've heard mention of "mutual information" and "joint information" (bivariate discrete random variables); shouldn't these terms be discussed? 199.196.144.13 ( talk) 21:08, 29 May 2008 (UTC)
I suggest that the article Observed information be merged with the current, since it repeats the definition of the Fisher information, only substituting the expected value w.r.t. sample probability distribution instead of the expected value with respect to the population. As such, the observed information is simply the sample Fisher information. … stpasha » 07:20, 24 January 2010 (UTC)
Merge tag removed, as no support or action for 2 years. Melcombe ( talk) 00:22, 8 February 2012 (UTC)
Thanks for correcting my edits to the Fisher information page, and sorry for saying something that wasn't quite correct (and also for getting the sign wrong!). The claim that the Fisher information is the Hessian of the entropy was in the article before I edited it, so it's good that it's gone now.
Correct me if I'm wrong, but it seems the Fisher information is always equal to the negative Hessian of the entropy for discrete probability distributions. I'd worked it out for discrete distributions and naively assumed it was true in general, but this looks like one of the many quirks of the definition of the continuous entropy as
(OT rant: IMO the continuous entropy should never have been defined that way, since it's not equal to the continuous limit of the discrete entropy, which actually diverges to infinity, and lacks many of the desirable properties of the discrete version. If you put in a scaling factor to prevent divergence, and are careful to make it invariant to coordinate changes, you always end up with a relative entropy instead of H as defined above.)
Anyway, if it is true that the Fisher information is equal to the negative Hessian of the entropy for discrete distributions I'd like to put the formula at some early point in the article (along with a caveat about continuous distributions), since it would help someone with my background get a handle on the Fisher information a bit more easily.
Nathaniel Virgo ( talk) 14:19, 7 October 2010 (UTC)
Hi All,
Firstly does the ; simbol mean the same as | (given) and secondly Im assuming f(x|θ) is a pdf for a continuous variable?
Thanks, Sachin Sachinabey ( talk) 08:12, 9 May 2011 (UTC)
Nowhere in the article it says that the Fischer Information Matrix is the inverse of the Covariance matrix in the multivariate normal case. Yet this information is used in many sources especially in the context of Bayesian Networks (e.g. see http://en.wikipedia.org/wiki/Kalman_filter#Information_filter) — Preceding unsigned comment added by 89.204.138.242 ( talk) 12:34, 23 January 2013 (UTC)
This article is virtually useless to any reader who is not already familiar with the field. I came here simply to find out what a 'Fisher matrix' is and there is nothing here which clearly answers that question. There's a sentence stating the general idea but it then dives directly into the full derivation with no simple example. The page appears to be trying to be a postgrad textbook rather than an article for a reader who has come across a term and would like to know what it means. Sadly I'm nowhere near able to do so myself, but I would suggest this article needs: 1) A simple example of what a Fisher matrix is. 2) A beginner-friendly description of what its components actually mean. The priority for any page should be to give someone who has never heard of the subject before a general idea of what the subject is, this article seems to fail on that. — Preceding unsigned comment added by 86.153.104.154 ( talk) 09:47, 4 April 2014 (UTC)
The introductory paragraph doesn't make any sense at all. It says:
In mathematical statistics, the Fisher information (sometimes simply called information[1]) is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ upon which the probability of X depends.
But X doesn't depend on $\theta$ at all. $X$ is external data. $\theta$ is a model parameter with which we are modelling $X$.
MisterSheik ( talk) 07:45, 4 September 2014 (UTC)
I've a similar question to Sachin. I'm not following this article when it uses | and ; in different contexts. In the computation of the first moment, are we dealing with conditional expectation or just expectation? If conditional then the switch to integration does not make sense as we should use the conditional density. The derivation only makes sense to me if | is replaced with ;. Can someone elucidate me on this? Smk65536 ( talk) 14:55, 23 October 2015 (UTC)
In the section "Single-parameter Bernoulli experiment," could you explain why the variance of the mean of successes in "n Bernoulli trials" is ? This is what you imply when you say that the variance in question is the inverse of the additive Fisher information. Everywhere I looked, the variance of the mean of successes in "n Bernoulli trials," with probability of success , is .
Also, why did you drop the word "independent" in the last sentence of that section? — Preceding
unsigned comment added by
174.192.30.141 (
talk)
04:21, 3 May 2018 (UTC)
I added a subsection that clarifies a commonly seen discrepancy in the definition of Fisher information. That is, some textbooks and notes define Fisher information with respect to one observation while some others define it using likelihood for all observations.
A critical problem is the lack of clarification which version is used in each scenario. I hope someone can help with adding short phrases after some frequently used important results that clarifies which version of Fisher information definition is used. For example, Cramer-Rao lower bound (writing only $I(\theta)$ rather than $nI(\theta)$ on the denominator may cause a misunderstanding that this lower bound doesn't depend on $n$, and it will be much better if it's clarified immediately that this $I(\theta)$ is defined using the joint log-likelihood so it is linear in $n$) -- this might seem a bit repetitive but it's really not (it may save a lot of time for beginners from confusion, especially when they compare the $I(\theta)$ defined in C-R lower bound to the $I(\theta)$ that appears in the asymptotic normal variance of MLE, where the tradition is almost unanimously defining $I(\theta)$ for only one observation).