![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||
|
![]() |
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
But as I remembered variance as a squared thing too, it became evident. If the variances dont vary wildly, then the sum of squares dont vary wildly either. And the rest of the conditions are not hard to agree upon. We dont need the gaussian-distribution (normal) to make it "optimal". — Preceding unsigned comment added by 2001:4643:E6E3:0:B99A:4E6A:45C0:A343 ( talk) 23:04, 7 January 2019 (UTC)
The introduction is confusing to me. Not clear what is assumed and what is not. And no further reference for why.
I don't understand the definition of the least squares estimators: they are supposed to make the sum of the squares as small as possible, but this sum of squares appears to be a random variable. Are they supposed to make the expected values of those random variables as small as possible? AxelBoldt 18:31 Jan 24, 2003 (UTC)
For fixed values of xi and Yi the sum of square is a function of βi for i=1,2. I'll add some material clarifying that. Michael Hardy 19:40 Jan 24, 2003 (UTC)
What does homoscedastic mean?
The term is explicitly defined in the article, but I will make it more conspicuous. It means: having equal variances. Michael Hardy 21:29 Jan 24, 2003 (UTC)
Should it be made explicit that the variance is unknown? Albmont 13:17, 27 February 2007 (UTC)
What is the meaning of ? are scalars. Covariance is defined only for vectors, isn't it? Sergivs-en 06:05, 16 September 2007 (UTC)
A suggestion is for someone with an interest in this to review the treatment under least squares. The basic Gauss-Markov apparently applies to the linear case. It would be interesting if there are nonlinear generalizations. Dfarrar 14:10, 11 April 2007 (UTC)
Nice article. Assuming that the β's and x's are not random (only the ε's are), would it be more accurate to say, then, that Ŷ is a stochastic process? - just a thought Ernie shoemaker 02:20, 26 July 2007 (UTC)
This article is very confusing and does not explain what it sets out to. Saying that the least squares estimator is the "best" one means nothing. How is "best" defined? Later in the article, it seems to imply that "best" means the one with smallest MSE, and so obviously the least squares estimate is "best".
Also, why not generalise to estimation of a vector of quantities, where the errors have a given correlation structure? —Preceding unsigned comment added by 198.240.128.75 ( talk) 16:16, 5 February 2008 (UTC)
What does best estimator mean exactely? Best like Consistent estimator? -- 217.83.22.80 ( talk) 01:19, 14 March 2008 (UTC)
As I read this article (and linear least squares), it sounds like the least-squares solution should be the BULE even in cases where the solution it gives is impossible. For example, suppose I measure a length three times and suppose that my measurement noise has a PDF of a uniform distribution within ±1mm of the true value. Suppose the length is actually 1mm and by chance I get 0.1mm, 0.1mm, and 1.9mm. All three measurements are off by 0.9mm, which is possible given my PDF. But the mean (least-squares estimate) of the samples is 0.7mm, which is 1.2mm away from the measurement of 1.9mm, which is impossible. My three measurements are consistent with a true value anywhere between 0.9mm and 1.1mm. I imagine the true likelihood function would be highest at 0.9mm and lowest at 1.1mm. This example seems to contradict the Gauss–Markov theorem. What am I missing? How do the assumptions of the Gauss–Markov theorem avoid this problem? —Ben FrantzDale ( talk) 19:21, 6 December 2008 (UTC)
Suppose we don't assume Xi to be deterministic, Don't we also need Xi and epsilon i to be independent? 69.134.206.109 ( talk) 16:54, 21 March 2009 (UTC)
Don't we need E(e|X)=0 (conditional), as well as E(e)=0 (unconditional) so that we can say E(X'e)=0. In the proof, I am having trouble seeing how the expected value distributes... -- 169.237.57.103 ( talk) 16:27, 27 June 2011 (UTC)
C' is equal to X (X X')^-1 + D'
At the time of my comment, C' was written as: X (X' X)^-1 + D'
And final expression for V(β) is therefor: σ^2 (X X')^-1 + σ^2 DD' — Preceding unsigned comment added by Pathdependent1 ( talk • contribs)
Prior content in this article duplicated one or more previously published sources. The material was copied from: http://wiki.answers.com/Q/When_are_OLS_estimators_BLUE. Infringing material has been rewritten or removed and must not be restored, unless it is duly released under a compatible license. (For more information, please see "using copyrighted works from others" if you are not the copyright holder of this material, or "donating copyrighted materials" if you are.) For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or published material; such additions will be deleted. Contributors may use copyrighted publications as a source of information, but not as a source of sentences or phrases. Accordingly, the material may be rewritten, but only if it does not infringe on the copyright of the original or plagiarize from that source. Please see our guideline on non-free text for how to properly implement limited quotations of copyrighted text. Wikipedia takes copyright violations very seriously, and persistent violators will be blocked from editing. While we appreciate contributions, we must require all contributors to understand and comply with these policies. Thank you. Voceditenore ( talk) 14:04, 10 October 2011 (UTC)
There are two remarks which -- if correct -- I think should be further emphasized in the article.
The first remark is explained here. It can be summed up as: OLS = Argmin of residual = (XX')^(-1)X'Y = argmin(among linear, unbiased estimators) of error = BLUE
The second remark pertains to linearity. As far as I can tell, there are two relations which are linear, which are used in the derivation. The first linear relation is the data-generating model Y = XB + err. The second is the requirement that the estimator be linear in the data. The fact that there are two linear relations is sometimes left out of the discussion. For example here. Only the linearity of the data-generating model is pointed out, even though they derive the BLUE.
Similarly, the OLS "happens to be linear", the BLUE has it as a requirement.
Any experts that can comment on these remarks? — Preceding unsigned comment added by 2001:700:3F00:0:189C:4003:11DF:E92E ( talk) 17:22, 24 April 2013 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Gauss–Markov theorem. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 21:11, 11 October 2017 (UTC)
(′ is Transpose). At the end of the proof, when we get H = 2X′X, we can just use that if B=√2 X, then H=B′B. From the Wikipedia article on definite matrixes, it follows that H is definite. In this article we have a different proof (a pretty nice one in my opinion, just long and maybe unnecessary). Should we remove it and replace it by the simpler theorem? Lainad27 ( talk) 02:07, 9 November 2021 (UTC)
Nevermind, forgot X is not always a square matrix... Lainad27 ( talk) 07:40, 9 November 2021 (UTC)
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||
|
![]() |
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
But as I remembered variance as a squared thing too, it became evident. If the variances dont vary wildly, then the sum of squares dont vary wildly either. And the rest of the conditions are not hard to agree upon. We dont need the gaussian-distribution (normal) to make it "optimal". — Preceding unsigned comment added by 2001:4643:E6E3:0:B99A:4E6A:45C0:A343 ( talk) 23:04, 7 January 2019 (UTC)
The introduction is confusing to me. Not clear what is assumed and what is not. And no further reference for why.
I don't understand the definition of the least squares estimators: they are supposed to make the sum of the squares as small as possible, but this sum of squares appears to be a random variable. Are they supposed to make the expected values of those random variables as small as possible? AxelBoldt 18:31 Jan 24, 2003 (UTC)
For fixed values of xi and Yi the sum of square is a function of βi for i=1,2. I'll add some material clarifying that. Michael Hardy 19:40 Jan 24, 2003 (UTC)
What does homoscedastic mean?
The term is explicitly defined in the article, but I will make it more conspicuous. It means: having equal variances. Michael Hardy 21:29 Jan 24, 2003 (UTC)
Should it be made explicit that the variance is unknown? Albmont 13:17, 27 February 2007 (UTC)
What is the meaning of ? are scalars. Covariance is defined only for vectors, isn't it? Sergivs-en 06:05, 16 September 2007 (UTC)
A suggestion is for someone with an interest in this to review the treatment under least squares. The basic Gauss-Markov apparently applies to the linear case. It would be interesting if there are nonlinear generalizations. Dfarrar 14:10, 11 April 2007 (UTC)
Nice article. Assuming that the β's and x's are not random (only the ε's are), would it be more accurate to say, then, that Ŷ is a stochastic process? - just a thought Ernie shoemaker 02:20, 26 July 2007 (UTC)
This article is very confusing and does not explain what it sets out to. Saying that the least squares estimator is the "best" one means nothing. How is "best" defined? Later in the article, it seems to imply that "best" means the one with smallest MSE, and so obviously the least squares estimate is "best".
Also, why not generalise to estimation of a vector of quantities, where the errors have a given correlation structure? —Preceding unsigned comment added by 198.240.128.75 ( talk) 16:16, 5 February 2008 (UTC)
What does best estimator mean exactely? Best like Consistent estimator? -- 217.83.22.80 ( talk) 01:19, 14 March 2008 (UTC)
As I read this article (and linear least squares), it sounds like the least-squares solution should be the BULE even in cases where the solution it gives is impossible. For example, suppose I measure a length three times and suppose that my measurement noise has a PDF of a uniform distribution within ±1mm of the true value. Suppose the length is actually 1mm and by chance I get 0.1mm, 0.1mm, and 1.9mm. All three measurements are off by 0.9mm, which is possible given my PDF. But the mean (least-squares estimate) of the samples is 0.7mm, which is 1.2mm away from the measurement of 1.9mm, which is impossible. My three measurements are consistent with a true value anywhere between 0.9mm and 1.1mm. I imagine the true likelihood function would be highest at 0.9mm and lowest at 1.1mm. This example seems to contradict the Gauss–Markov theorem. What am I missing? How do the assumptions of the Gauss–Markov theorem avoid this problem? —Ben FrantzDale ( talk) 19:21, 6 December 2008 (UTC)
Suppose we don't assume Xi to be deterministic, Don't we also need Xi and epsilon i to be independent? 69.134.206.109 ( talk) 16:54, 21 March 2009 (UTC)
Don't we need E(e|X)=0 (conditional), as well as E(e)=0 (unconditional) so that we can say E(X'e)=0. In the proof, I am having trouble seeing how the expected value distributes... -- 169.237.57.103 ( talk) 16:27, 27 June 2011 (UTC)
C' is equal to X (X X')^-1 + D'
At the time of my comment, C' was written as: X (X' X)^-1 + D'
And final expression for V(β) is therefor: σ^2 (X X')^-1 + σ^2 DD' — Preceding unsigned comment added by Pathdependent1 ( talk • contribs)
Prior content in this article duplicated one or more previously published sources. The material was copied from: http://wiki.answers.com/Q/When_are_OLS_estimators_BLUE. Infringing material has been rewritten or removed and must not be restored, unless it is duly released under a compatible license. (For more information, please see "using copyrighted works from others" if you are not the copyright holder of this material, or "donating copyrighted materials" if you are.) For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or published material; such additions will be deleted. Contributors may use copyrighted publications as a source of information, but not as a source of sentences or phrases. Accordingly, the material may be rewritten, but only if it does not infringe on the copyright of the original or plagiarize from that source. Please see our guideline on non-free text for how to properly implement limited quotations of copyrighted text. Wikipedia takes copyright violations very seriously, and persistent violators will be blocked from editing. While we appreciate contributions, we must require all contributors to understand and comply with these policies. Thank you. Voceditenore ( talk) 14:04, 10 October 2011 (UTC)
There are two remarks which -- if correct -- I think should be further emphasized in the article.
The first remark is explained here. It can be summed up as: OLS = Argmin of residual = (XX')^(-1)X'Y = argmin(among linear, unbiased estimators) of error = BLUE
The second remark pertains to linearity. As far as I can tell, there are two relations which are linear, which are used in the derivation. The first linear relation is the data-generating model Y = XB + err. The second is the requirement that the estimator be linear in the data. The fact that there are two linear relations is sometimes left out of the discussion. For example here. Only the linearity of the data-generating model is pointed out, even though they derive the BLUE.
Similarly, the OLS "happens to be linear", the BLUE has it as a requirement.
Any experts that can comment on these remarks? — Preceding unsigned comment added by 2001:700:3F00:0:189C:4003:11DF:E92E ( talk) 17:22, 24 April 2013 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Gauss–Markov theorem. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 21:11, 11 October 2017 (UTC)
(′ is Transpose). At the end of the proof, when we get H = 2X′X, we can just use that if B=√2 X, then H=B′B. From the Wikipedia article on definite matrixes, it follows that H is definite. In this article we have a different proof (a pretty nice one in my opinion, just long and maybe unnecessary). Should we remove it and replace it by the simpler theorem? Lainad27 ( talk) 02:07, 9 November 2021 (UTC)
Nevermind, forgot X is not always a square matrix... Lainad27 ( talk) 07:40, 9 November 2021 (UTC)