![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
![]() | It is requested that a photograph be
included in this article to
improve its quality.
The external tool WordPress Openverse may be able to locate suitable images on Flickr and other web sites. |
This text should be read with care, because of the errors, in the text and formulas.
Whoever wrote out the "intuition section' deserves a medal. Such clarity I have rarely experienced in a wiki statistical article. +1 for making clear the relations between the Poisson and geometric distributions. Theblindsage ( talk) 23:45, 15 June 2015 (UTC)
There are no errors here, but this article needs to be cleaned up and fleshed out a bit. I will try to work on it soon... -- shaile 00:33, 14 April 2006 (UTC)
The second sentence of the introduction is terribly muddled. It should be clarified. I would do so, but I am currently struggling to understand this subject, so I shan't try just yet. Thomas Tvileren 08:44, 15 October 2007 (UTC)
Removed the erroneous statement " [The GLM] allows for the response variable to have an error distribution other than the normal distribution" from the introduction. — Preceding unsigned comment added by 2001:18E8:2:1080:81B8:6040:8B94:CC0F ( talk) 17:10, 24 October 2021 (UTC)
The edit I made (that was rved) did trim the overall size of the page and reduce the number of examples, but as it stands it is very difficult to understand what a GLM is. The basic question is what is a glm? After we have said that, we need to say why you might want to use one and then I think we should get a little into how parameters are estimated. I'm going to rv it back and expand substantially on what I wrote last time.
It would be great if we could expand on the part about using any CDF. here we have the alternative example where and is zero otherwise. Pdbailey
So the reason that I like the terminology is that it seperates out the linear part of the equation () from the random part (). It makes it clear that there is a linear model in there. Also, the way the second equation is now ( so that ) it is a bit of a garbled mess. Pdbailey 02:21, 21 April 2006 (UTC)
I have been reorganizing this article a bit, starting at the top. I wish however to point out a small but important change I made to the definition of exponential family here. Where before it contained a term , in McCullagh & Nelder (p28) it clearly shows . I have made this change and eliminated the reference to the b function. If there is a more reputable source than M&N (hard to imagine) which has the more general form, feel free to revert but please include the source. Baccyak4H 18:50, 27 October 2006 (UTC)
Thanks. I was planning to address some more advantages of this form (M&N's form) including sufficient statistics, variance as function of a, c and d, and the canonical parameter. But that must wait; my copy is elsewhere now ;-). I certainly will be continuing my reorganization (mostly to make different editors' contributions sound like they came from one editor -- my pet peeve). While I may do a lot of tweaking, please feel free to improve my efforts. Baccyak4H 03:06, 28 October 2006 (UTC)
I suppose in the end a more general formula would be better. Is there a version like the current version which also contains the so-called dispersion parameter? M&N has that (), and given it appears in an overdispersion context twice (one of which I just added), I see some merit in including it in the formula: one of the great merits as I see it is the unification that the theory provides - another reason I may discribe the canonical parameter some more (Note: I removed it from the link table title, not because it was wrong (it was not), but because it was unexplained. I plan on returning to fix that). Baccyak4H 03:00, 29 October 2006 (UTC)
Well, here is a possible generalization of the exponential family formula which includes a dispersion parameter . THe inclusion of is in the spirit of M&N's definition, but the rest of the formula is the same as the current one.
I tried to reword the discussion there to apply to this. Baccyak4H 14:58, 30 October 2006 (UTC)
Just wanted to add a note thanking those responsible for that glm formula under 'model components'. It really helps a lot, Baccyak4. 158.121.165.13 ( talk) 14:59, 19 October 2009 (UTC)
Just dawned on me that for the formula, if b is invertable then it is exactly equivalent to the version without b; they merely represent a reparameterization of each other. Baccyak4H 16:34, 31 October 2006 (UTC)
1. There is no mention of the probit link. From a passage in McCullagh and Nelder, the probit work is historically important, in particular the presentation of the scoring algorithm in an appendix written by R.A. Fisher for a paper by the toxicologist Bliss.
2. Mention that in practice glm's provide an important way to address heteroscedasticity.
My apologies in the event I have overlooked some passage that addresses these concerns.
Dfarrar 04:48, 20 February 2007 (UTC)
There is an article (stub) under development for probit.
Dfarrar
04:51, 20 February 2007 (UTC)
Should the probit link be included in the table of canonical link functions, or is it not considered a canonical link function? Bill Jefferys ( talk) 23:43, 20 December 2007 (UTC)
Shouldn't beta be in boldface throughout? At present some of the betas are bold and others aren't. This is particularly disturbing when it happens in Xβ.
-- 84.9.83.26 ( talk) 11:46, 19 December 2007 (UTC)
Thanks! :-)
Another one. In a vector context, shouldn't the linear predictor η be boldface too? η = X β
-- 84.9.73.5 ( talk) 13:03, 21 December 2007 (UTC) (formerly 84.9.83.26)
Several types of regression which fall under this topic have been added to the see also section. However, links to these already appear where they are discussed higher up in the article. I propose to remove them in the see also section. Anyone with me here? Baccyak4H ( Yak!) 19:12, 2 February 2008 (UTC)
I think the technical term for the family of distributions is the Exponential Dispersion Family, though unfortunately I have no sources to confirm it other than my hazy memory. Can anyone confirm this?
Technically one doesn't even need to specify a distribution to fit a GLM, only a variance function is required (though specifying a distribution means one can estimate the dispersion parameter by maximum likelihood. However I don't know if that can be worked into the article without making it more confusing.
There is no mention in the article of iteratively re-weighted least squares (IRLS or IWLS depending on who you talk to), the method used for estimating the parameters, and the current article in that location doesn't seem relevant GLMs.
Thoughts? - 3mta3 ( talk) 00:17, 8 April 2008 (UTC)
When you apply a link function to observed data, aren't you implicitly re-weighting points? For example, if I have samples for f(x) and expect it to be of the form f(x)=βx2, I could take the square root of my data and try to find β to minimize the norm of the residual of
but if I had equal variance on all of my samples, then a simple least-squares fit of (1) would be biased, putting more emphasis on a sample at, say, x=0.1 than at x=2 (if I have my head screwed on). I see some talk of this in the article and in this discussion page, but it isn't clear to me. How is this issue dealt with? Are the samples re-weighted based on the link function? —Ben FrantzDale ( talk) 14:54, 26 August 2008 (UTC)
Formerly, formula of the density function was
With that density function, if a is identity function and b is identity function, the mean of the distribution is
and the variance is
Now, in the formula of the density function, the sign in front of c is minus. The formula is
I am fine with the change, because the formula is more closely resembles the formula that I saw in a book. But, with that change in the formula of the density function, the formulas of mean and variance are now incorrect.
So, to make the formulas of mean and variance correct, there are two alternatives:
-- Anreto ( talk) 04:57, 8 October 2008 (UTC)
This is not as clearly spelled out in the book, but I would encourage you to read section 2.5 and 2.5.1 before you pass judgment on the claim. Now, I can see that the claim might, in order to be completely correct, be taken down a notch. But in precision it loses the ease of reading which is important in a lead. Fisher proposed a method of approximating the Hessian, but this is the key to using Newton's method for this set of problems. Certainly, Nelder and Wedderburn's paper and found that the method could be expanded to the exponential family form. But the Nelder and Wedderburn paper is about fitting the model and less about its application (which the MC&N book covers in more detail) and, again, Fisher made the one of the key insights on this front. Now, if you read this, read sections 2.5 of MC&N, and understand it and still want it out, then by all means, take it out. Alternately, it might make more sense to point it out in the body instead of in the lead, that is fine too. PDBailey ( talk) 21:43, 11 January 2009 (UTC)
Wow, there really is no fitting section, I'll try to rectify that soon. PDBailey ( talk) 03:32, 27 January 2009 (UTC)
About this sentence in the lead:
I think that is an accurate description (IIRC I helped write it). However, it might be considered a little wordy or technical. I was thinking of rephrasing it to segue with the first sentence, about being a generalization of least squares regression (linear regression). Something along the lines of, rather than just equating the mean to the linear predictor, it allows for particular relationships between them (in statspeak, the link function is being generalized). Thoughts? Baccyak4H ( Yak!) 17:25, 6 February 2009 (UTC)
So if i'm not mistaken, this model can be expressed using a standard notation (that is, comparable with the rest of regression models articles) as
where g is some monotonous "link" function (might be helpful to add a reasoning why we have g-1 instead of simply g), and ε belongs to an exponential family. Then it proceeds with explanation how MLE can be applied to obtain the estimates.
This "generalized linear model" is in fact a sub-case of slightly more general non-linear regression model
and therefore it is also solvable by all the standard methods used for non-linear models, such as
It seems however that neither this connection, nor alternative approaches to estimation are ever mentioned in this article. // Stpasha ( talk) 10:47, 3 July 2009 (UTC)
If the reported atomic values were reported and related to the 2 atomic parameters: 1, Deuteron number (=2 times the Z number), and 2, Extra neutron number (= the n - Z number), what would be the best regression method to assign the variance to these 2 factors and determine the best regression equation? WFPM ( talk) 03:05, 10 December 2011 (UTC)
It should be noted that a regression of the incremental deuteron mass values results in the creation of 2 separate population of mass increase values, which are: 1, an increased even numbered deuteron mass value and 2, an increased odd numbered deuteron mass value. So maybe the analysis requires the regression to be for more than just 2 parameters. WFPM ( talk) 16:43, 11 December 2011 (UTC)
An extension is something that expands the generalized linear model: the multinomial models shown are simply special cases of distribution and link. — Preceding unsigned comment added by 24.34.200.147 ( talk) 01:17, 28 May 2012 (UTC)
The section on HGLMs incorrectly suggested that these models differ from GLMMs. If what differs is the method for fitting the models, i.e., if there is no difference in the mathematical expression of the relationship between the covariates and the outcomes, then there is no need for a new acronym or to claim a new model has been expressed. I removed the section. — Preceding unsigned comment added by 24.34.200.147 ( talk) 01:24, 28 May 2012 (UTC)
A line implied that GLMMs "assume" normal random effects. In fact, there is no such limitation, though abilities to relax this assumption (other than via MCMC) are limited. The GLMM does not include any such assumption. Similarly, the GLMM is not (as was implied) limited to single-level models, with multilevel of hierarchical linear models implying a further generalization. I simplified the "entensions" section to correct this. — Preceding unsigned comment added by 24.34.200.147 ( talk) 01:27, 28 May 2012 (UTC)
Small point perhaps, but I found a bit puzzling on first encounter: Can someone "in the know" add a small note about notation "μ". I find it a bit confusing here, and I found the same some years ago when I first read about GLM from a textbook. Towards the start here it seems to be implied that "μ" will, in the general case, stand for the expected value of the response variable. But then later, this notation is not always adhered to strictly. E.g. it seems to be used that way for Bernoulli response, but not for Binomial or Multinomial (unless you redefine the response to be the proportion, rather than the count, which is not clearly and consistently done). — Preceding unsigned comment added by 83.217.170.175 ( talk) 01:11, 18 September 2012 (UTC)
Also, it might be a slightly confusing notation because the mean varies with --- I think it is confusing to leave out as an argument. Superpronker ( talk) 10:26, 19 October 2012 (UTC)
The part about binomial regression shows the link function for a Bernoulli variable, without really saying why this works for a binomial. For instance, if Y is Binom(n,p), then E(Y) = np and g(E(y)) = log((np) / (1-np)) which doesn't make sense. — Preceding unsigned comment added by Statr ( talk • contribs) 19:41, 9 April 2013 (UTC)
I reworked the complementary log-log section. A colleague asked me a question that clearly indicated he confused what he read there about the identity link with the cloglog. I also added a discussion of cloglog and its relation to the Poisson distribution. I hope most people find this clearer and does not confuse anyone. DavidMCEddy ( talk) 08:34, 19 October 2015 (UTC)
This article, after a suitable cleanup, could provide a basis for a section on hypothesis testing in generalized linear models. Though the title and intro suppose binary outcomes, nothing in the discussion requires it. Wikiacc ( ¶) 04:40, 11 December 2020 (UTC)
In the table "Common distributions with typical uses and canonical link functions", at column "Mean function" and row "Binomial", the formula is wrong. It should be instead of .-- 148.252.128.186 ( talk) 19:20, 15 November 2021 (UTC)
when is glm consistent for its estimators? 68.134.243.51 ( talk) 14:23, 28 September 2022 (UTC)
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
![]() | It is requested that a photograph be
included in this article to
improve its quality.
The external tool WordPress Openverse may be able to locate suitable images on Flickr and other web sites. |
This text should be read with care, because of the errors, in the text and formulas.
Whoever wrote out the "intuition section' deserves a medal. Such clarity I have rarely experienced in a wiki statistical article. +1 for making clear the relations between the Poisson and geometric distributions. Theblindsage ( talk) 23:45, 15 June 2015 (UTC)
There are no errors here, but this article needs to be cleaned up and fleshed out a bit. I will try to work on it soon... -- shaile 00:33, 14 April 2006 (UTC)
The second sentence of the introduction is terribly muddled. It should be clarified. I would do so, but I am currently struggling to understand this subject, so I shan't try just yet. Thomas Tvileren 08:44, 15 October 2007 (UTC)
Removed the erroneous statement " [The GLM] allows for the response variable to have an error distribution other than the normal distribution" from the introduction. — Preceding unsigned comment added by 2001:18E8:2:1080:81B8:6040:8B94:CC0F ( talk) 17:10, 24 October 2021 (UTC)
The edit I made (that was rved) did trim the overall size of the page and reduce the number of examples, but as it stands it is very difficult to understand what a GLM is. The basic question is what is a glm? After we have said that, we need to say why you might want to use one and then I think we should get a little into how parameters are estimated. I'm going to rv it back and expand substantially on what I wrote last time.
It would be great if we could expand on the part about using any CDF. here we have the alternative example where and is zero otherwise. Pdbailey
So the reason that I like the terminology is that it seperates out the linear part of the equation () from the random part (). It makes it clear that there is a linear model in there. Also, the way the second equation is now ( so that ) it is a bit of a garbled mess. Pdbailey 02:21, 21 April 2006 (UTC)
I have been reorganizing this article a bit, starting at the top. I wish however to point out a small but important change I made to the definition of exponential family here. Where before it contained a term , in McCullagh & Nelder (p28) it clearly shows . I have made this change and eliminated the reference to the b function. If there is a more reputable source than M&N (hard to imagine) which has the more general form, feel free to revert but please include the source. Baccyak4H 18:50, 27 October 2006 (UTC)
Thanks. I was planning to address some more advantages of this form (M&N's form) including sufficient statistics, variance as function of a, c and d, and the canonical parameter. But that must wait; my copy is elsewhere now ;-). I certainly will be continuing my reorganization (mostly to make different editors' contributions sound like they came from one editor -- my pet peeve). While I may do a lot of tweaking, please feel free to improve my efforts. Baccyak4H 03:06, 28 October 2006 (UTC)
I suppose in the end a more general formula would be better. Is there a version like the current version which also contains the so-called dispersion parameter? M&N has that (), and given it appears in an overdispersion context twice (one of which I just added), I see some merit in including it in the formula: one of the great merits as I see it is the unification that the theory provides - another reason I may discribe the canonical parameter some more (Note: I removed it from the link table title, not because it was wrong (it was not), but because it was unexplained. I plan on returning to fix that). Baccyak4H 03:00, 29 October 2006 (UTC)
Well, here is a possible generalization of the exponential family formula which includes a dispersion parameter . THe inclusion of is in the spirit of M&N's definition, but the rest of the formula is the same as the current one.
I tried to reword the discussion there to apply to this. Baccyak4H 14:58, 30 October 2006 (UTC)
Just wanted to add a note thanking those responsible for that glm formula under 'model components'. It really helps a lot, Baccyak4. 158.121.165.13 ( talk) 14:59, 19 October 2009 (UTC)
Just dawned on me that for the formula, if b is invertable then it is exactly equivalent to the version without b; they merely represent a reparameterization of each other. Baccyak4H 16:34, 31 October 2006 (UTC)
1. There is no mention of the probit link. From a passage in McCullagh and Nelder, the probit work is historically important, in particular the presentation of the scoring algorithm in an appendix written by R.A. Fisher for a paper by the toxicologist Bliss.
2. Mention that in practice glm's provide an important way to address heteroscedasticity.
My apologies in the event I have overlooked some passage that addresses these concerns.
Dfarrar 04:48, 20 February 2007 (UTC)
There is an article (stub) under development for probit.
Dfarrar
04:51, 20 February 2007 (UTC)
Should the probit link be included in the table of canonical link functions, or is it not considered a canonical link function? Bill Jefferys ( talk) 23:43, 20 December 2007 (UTC)
Shouldn't beta be in boldface throughout? At present some of the betas are bold and others aren't. This is particularly disturbing when it happens in Xβ.
-- 84.9.83.26 ( talk) 11:46, 19 December 2007 (UTC)
Thanks! :-)
Another one. In a vector context, shouldn't the linear predictor η be boldface too? η = X β
-- 84.9.73.5 ( talk) 13:03, 21 December 2007 (UTC) (formerly 84.9.83.26)
Several types of regression which fall under this topic have been added to the see also section. However, links to these already appear where they are discussed higher up in the article. I propose to remove them in the see also section. Anyone with me here? Baccyak4H ( Yak!) 19:12, 2 February 2008 (UTC)
I think the technical term for the family of distributions is the Exponential Dispersion Family, though unfortunately I have no sources to confirm it other than my hazy memory. Can anyone confirm this?
Technically one doesn't even need to specify a distribution to fit a GLM, only a variance function is required (though specifying a distribution means one can estimate the dispersion parameter by maximum likelihood. However I don't know if that can be worked into the article without making it more confusing.
There is no mention in the article of iteratively re-weighted least squares (IRLS or IWLS depending on who you talk to), the method used for estimating the parameters, and the current article in that location doesn't seem relevant GLMs.
Thoughts? - 3mta3 ( talk) 00:17, 8 April 2008 (UTC)
When you apply a link function to observed data, aren't you implicitly re-weighting points? For example, if I have samples for f(x) and expect it to be of the form f(x)=βx2, I could take the square root of my data and try to find β to minimize the norm of the residual of
but if I had equal variance on all of my samples, then a simple least-squares fit of (1) would be biased, putting more emphasis on a sample at, say, x=0.1 than at x=2 (if I have my head screwed on). I see some talk of this in the article and in this discussion page, but it isn't clear to me. How is this issue dealt with? Are the samples re-weighted based on the link function? —Ben FrantzDale ( talk) 14:54, 26 August 2008 (UTC)
Formerly, formula of the density function was
With that density function, if a is identity function and b is identity function, the mean of the distribution is
and the variance is
Now, in the formula of the density function, the sign in front of c is minus. The formula is
I am fine with the change, because the formula is more closely resembles the formula that I saw in a book. But, with that change in the formula of the density function, the formulas of mean and variance are now incorrect.
So, to make the formulas of mean and variance correct, there are two alternatives:
-- Anreto ( talk) 04:57, 8 October 2008 (UTC)
This is not as clearly spelled out in the book, but I would encourage you to read section 2.5 and 2.5.1 before you pass judgment on the claim. Now, I can see that the claim might, in order to be completely correct, be taken down a notch. But in precision it loses the ease of reading which is important in a lead. Fisher proposed a method of approximating the Hessian, but this is the key to using Newton's method for this set of problems. Certainly, Nelder and Wedderburn's paper and found that the method could be expanded to the exponential family form. But the Nelder and Wedderburn paper is about fitting the model and less about its application (which the MC&N book covers in more detail) and, again, Fisher made the one of the key insights on this front. Now, if you read this, read sections 2.5 of MC&N, and understand it and still want it out, then by all means, take it out. Alternately, it might make more sense to point it out in the body instead of in the lead, that is fine too. PDBailey ( talk) 21:43, 11 January 2009 (UTC)
Wow, there really is no fitting section, I'll try to rectify that soon. PDBailey ( talk) 03:32, 27 January 2009 (UTC)
About this sentence in the lead:
I think that is an accurate description (IIRC I helped write it). However, it might be considered a little wordy or technical. I was thinking of rephrasing it to segue with the first sentence, about being a generalization of least squares regression (linear regression). Something along the lines of, rather than just equating the mean to the linear predictor, it allows for particular relationships between them (in statspeak, the link function is being generalized). Thoughts? Baccyak4H ( Yak!) 17:25, 6 February 2009 (UTC)
So if i'm not mistaken, this model can be expressed using a standard notation (that is, comparable with the rest of regression models articles) as
where g is some monotonous "link" function (might be helpful to add a reasoning why we have g-1 instead of simply g), and ε belongs to an exponential family. Then it proceeds with explanation how MLE can be applied to obtain the estimates.
This "generalized linear model" is in fact a sub-case of slightly more general non-linear regression model
and therefore it is also solvable by all the standard methods used for non-linear models, such as
It seems however that neither this connection, nor alternative approaches to estimation are ever mentioned in this article. // Stpasha ( talk) 10:47, 3 July 2009 (UTC)
If the reported atomic values were reported and related to the 2 atomic parameters: 1, Deuteron number (=2 times the Z number), and 2, Extra neutron number (= the n - Z number), what would be the best regression method to assign the variance to these 2 factors and determine the best regression equation? WFPM ( talk) 03:05, 10 December 2011 (UTC)
It should be noted that a regression of the incremental deuteron mass values results in the creation of 2 separate population of mass increase values, which are: 1, an increased even numbered deuteron mass value and 2, an increased odd numbered deuteron mass value. So maybe the analysis requires the regression to be for more than just 2 parameters. WFPM ( talk) 16:43, 11 December 2011 (UTC)
An extension is something that expands the generalized linear model: the multinomial models shown are simply special cases of distribution and link. — Preceding unsigned comment added by 24.34.200.147 ( talk) 01:17, 28 May 2012 (UTC)
The section on HGLMs incorrectly suggested that these models differ from GLMMs. If what differs is the method for fitting the models, i.e., if there is no difference in the mathematical expression of the relationship between the covariates and the outcomes, then there is no need for a new acronym or to claim a new model has been expressed. I removed the section. — Preceding unsigned comment added by 24.34.200.147 ( talk) 01:24, 28 May 2012 (UTC)
A line implied that GLMMs "assume" normal random effects. In fact, there is no such limitation, though abilities to relax this assumption (other than via MCMC) are limited. The GLMM does not include any such assumption. Similarly, the GLMM is not (as was implied) limited to single-level models, with multilevel of hierarchical linear models implying a further generalization. I simplified the "entensions" section to correct this. — Preceding unsigned comment added by 24.34.200.147 ( talk) 01:27, 28 May 2012 (UTC)
Small point perhaps, but I found a bit puzzling on first encounter: Can someone "in the know" add a small note about notation "μ". I find it a bit confusing here, and I found the same some years ago when I first read about GLM from a textbook. Towards the start here it seems to be implied that "μ" will, in the general case, stand for the expected value of the response variable. But then later, this notation is not always adhered to strictly. E.g. it seems to be used that way for Bernoulli response, but not for Binomial or Multinomial (unless you redefine the response to be the proportion, rather than the count, which is not clearly and consistently done). — Preceding unsigned comment added by 83.217.170.175 ( talk) 01:11, 18 September 2012 (UTC)
Also, it might be a slightly confusing notation because the mean varies with --- I think it is confusing to leave out as an argument. Superpronker ( talk) 10:26, 19 October 2012 (UTC)
The part about binomial regression shows the link function for a Bernoulli variable, without really saying why this works for a binomial. For instance, if Y is Binom(n,p), then E(Y) = np and g(E(y)) = log((np) / (1-np)) which doesn't make sense. — Preceding unsigned comment added by Statr ( talk • contribs) 19:41, 9 April 2013 (UTC)
I reworked the complementary log-log section. A colleague asked me a question that clearly indicated he confused what he read there about the identity link with the cloglog. I also added a discussion of cloglog and its relation to the Poisson distribution. I hope most people find this clearer and does not confuse anyone. DavidMCEddy ( talk) 08:34, 19 October 2015 (UTC)
This article, after a suitable cleanup, could provide a basis for a section on hypothesis testing in generalized linear models. Though the title and intro suppose binary outcomes, nothing in the discussion requires it. Wikiacc ( ¶) 04:40, 11 December 2020 (UTC)
In the table "Common distributions with typical uses and canonical link functions", at column "Mean function" and row "Binomial", the formula is wrong. It should be instead of .-- 148.252.128.186 ( talk) 19:20, 15 November 2021 (UTC)
when is glm consistent for its estimators? 68.134.243.51 ( talk) 14:23, 28 September 2022 (UTC)