![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
This page appears to fail to recognize the existance of ordered multinomial responses, no? Pdbailey 02:15, 12 June 2007 (UTC)
In the Model, we better have the probabilities adding to 1:
That means the denominator needs to be:
and not just
The 1 here is the contribution from .
By the way, I'm used to thinking of as a vector, in which case is a matrix, is a vector, and is a dot-product. The way these equations are written imply that and are scalars, which is okay if you are predicting y from a single scalar X. I found that a bit confusing and had to stare at it a bit longer than usual to parse it. Some clarification about this might be helpful in the article.
I believe the summation in the denominator is over an incorrect index. If there are K types of probabilistic events and J explanatory variables x1...xJ, the summation in the denominator should be over k to K k=2,3...K. The missing k is the base probabilistic event. The implicit summation embedded in the vector multiplication XB is over j from j=1 to j=J. —Preceding unsigned comment added by 72.13.225.74 ( talk) 17:13, 20 October 2008 (UTC)
Choosing a base does not have much to do with the probabilities adding to one, does it? I mean, if I add the stuff we had before it would still sum to one. I think the problem is one of identification. I will ask at the ref desk thought. Brusegadi ( talk) 08:17, 5 November 2008 (UTC)
I changed "If the multinomial logit is used to model choices and the error terms are assumed to be independent, it can violate the assumption of independence of irrelevant alternatives (IIA)." The IIA stems from the independence of the error term. The IIA is violated not by the independence of the error term, but by the non independence in practice. In any case, as there is no formulation of the MNL with an equation with errors and their distribution, I do not think we should describe the IIA in terms of structure of errors as this will not make the article clearer. The bus example is better. Also the nested logit does not allow for correlated error but for different variances for the error term in different subgroups within which the IIA still holds (see for instance Green Econometric Analysis, pp. 847-850). Gpeilon ( talk) 20:22, 3 November 2009 (UTC)
As a student i think a visual representation might be nice, i would make one my self but i am just starting to learn R. —Preceding unsigned comment added by Thedreamshaper ( talk • contribs) 21:51, 28 October 2010 (UTC)
There are several problems with the recent edit, so I am reverting it pending a discussion here.
First, please be careful to hit the "Show Preview" button so you can proofread before you save the page. The recent edit contains a stray { , a misspelled link that appears as a non-functioning redlink, and the incomplete sentence "Quite often, this external option is used as the benchmark upon which the comparison."
Regarding the paragraph
I find this very unclear. What does "external" mean here? How about "undefined"? If we complete the sentence as "Quite often, this external option is used as the benchmark upon which the comparison [is based]", what is the antecedent of "the comparison"? And what does the sentence "However, it is possible to fix the attribute value of one of the fully characterized options instead." mean? For one thing, the term "attribute values" has not yet been defined in the article. Does it mean regressors?
The passage
is unclearly worded. What is the semantic relationship between "an outcome occurs" and "an outcome is realized"?--are they intended to be the same, or somehow related? And the term "outcome j" pops up at the end without previously being referred to as such.
The sentence "The model is used in several applications such as marketing and machine learning." makes it look very much narrower in its areas of use than is true -- the lede is better in that regard.
Also, the section heading "Introduction" is not a good idea since the lede at the top of the page is supposed to be the introduction.
In your reason for the citation-needed tag you say
I don't understand -- if you use it predict the several possible outcomes based on "attributes", which I interpret to mean explanatory variables, then this is a regression. Can you explain it here?
Thanks for contributing to Wikipedia -- I see that it is your first edit -- and I hope my comments will help you revise your edit. Duoduoduo ( talk) 21:05, 28 November 2011 (UTC)
An IP recently deleted the "assumptions" and "estimation of the intercept" sections. I strongly disagree that this makes the article better, so I undid this. Any objections? 018 ( talk) 14:45, 27 December 2011 (UTC)
Some equations in the section Multinomial logistic regression#As a set of independent binary regressions have some parameters which as far as I can see are not defined, and it appears to me that they ought to be just . Any objection to my removing the prime signs? Duoduoduo ( talk) 17:54, 5 February 2013 (UTC)
Perhaps I'm missing something obvious, but the section "As a set of independent binary regressions" starts by talking about doing K-1 regressions and then presents formulas for the log probability ratios such as Yi=1 / Yi=K. If we were to run this as, say, a logistic regression, what would the dependent variable be? 1 if the outcome is 1 and 0 if the outcome is K, but what about Yi=2 to K-1? — Preceding unsigned comment added by Jetopal ( talk • contribs) 01:10, 26 January 2014 (UTC)
The stated function is not actually the softmax – it's missing x_i in the numerator – and so it approximates the indicator function 1(x_j = max_i x_i) rather than max_i x_i, as is stated in the article. — Preceding unsigned comment added by 162.129.251.86 ( talk • contribs)
@ Loraof: I saw you just added this page to Category:Categorical data, but I'm not sure if that's appropriate. The only thing necessarily categorical in LR are the class labels (the outcomes/dependent variables), but that is true for any classification model. The inputs (independent variables/predictors/features) are real values. Perhaps Category:Classification algorithms should be a subcat of Category:Categorical data? QVVERTYVS ( hm?) 18:00, 26 May 2015 (UTC)
Why is there no worked example?
Mark W. Miller ( talk) 11:36, 28 November 2015 (UTC)
In the introduction, it says that multinomial logistic regression is a solution to a classification problem. This is correct, but not complete. As a regression method, it is also used to find out how the independent variables are related to the dependent variable, in this case by getting odds ratios. Sometimes classification is not a goal at all.
Where should that go? — Preceding unsigned comment added by PeterLFlomPhD ( talk • contribs) 12:51, 24 August 2017 (UTC)
This page is in desperate need of citations. There are a few in the introduction, but most sections present the material with no references at all. 66.71.17.135 ( talk) 17:18, 7 September 2018 (UTC)
![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
This page appears to fail to recognize the existance of ordered multinomial responses, no? Pdbailey 02:15, 12 June 2007 (UTC)
In the Model, we better have the probabilities adding to 1:
That means the denominator needs to be:
and not just
The 1 here is the contribution from .
By the way, I'm used to thinking of as a vector, in which case is a matrix, is a vector, and is a dot-product. The way these equations are written imply that and are scalars, which is okay if you are predicting y from a single scalar X. I found that a bit confusing and had to stare at it a bit longer than usual to parse it. Some clarification about this might be helpful in the article.
I believe the summation in the denominator is over an incorrect index. If there are K types of probabilistic events and J explanatory variables x1...xJ, the summation in the denominator should be over k to K k=2,3...K. The missing k is the base probabilistic event. The implicit summation embedded in the vector multiplication XB is over j from j=1 to j=J. —Preceding unsigned comment added by 72.13.225.74 ( talk) 17:13, 20 October 2008 (UTC)
Choosing a base does not have much to do with the probabilities adding to one, does it? I mean, if I add the stuff we had before it would still sum to one. I think the problem is one of identification. I will ask at the ref desk thought. Brusegadi ( talk) 08:17, 5 November 2008 (UTC)
I changed "If the multinomial logit is used to model choices and the error terms are assumed to be independent, it can violate the assumption of independence of irrelevant alternatives (IIA)." The IIA stems from the independence of the error term. The IIA is violated not by the independence of the error term, but by the non independence in practice. In any case, as there is no formulation of the MNL with an equation with errors and their distribution, I do not think we should describe the IIA in terms of structure of errors as this will not make the article clearer. The bus example is better. Also the nested logit does not allow for correlated error but for different variances for the error term in different subgroups within which the IIA still holds (see for instance Green Econometric Analysis, pp. 847-850). Gpeilon ( talk) 20:22, 3 November 2009 (UTC)
As a student i think a visual representation might be nice, i would make one my self but i am just starting to learn R. —Preceding unsigned comment added by Thedreamshaper ( talk • contribs) 21:51, 28 October 2010 (UTC)
There are several problems with the recent edit, so I am reverting it pending a discussion here.
First, please be careful to hit the "Show Preview" button so you can proofread before you save the page. The recent edit contains a stray { , a misspelled link that appears as a non-functioning redlink, and the incomplete sentence "Quite often, this external option is used as the benchmark upon which the comparison."
Regarding the paragraph
I find this very unclear. What does "external" mean here? How about "undefined"? If we complete the sentence as "Quite often, this external option is used as the benchmark upon which the comparison [is based]", what is the antecedent of "the comparison"? And what does the sentence "However, it is possible to fix the attribute value of one of the fully characterized options instead." mean? For one thing, the term "attribute values" has not yet been defined in the article. Does it mean regressors?
The passage
is unclearly worded. What is the semantic relationship between "an outcome occurs" and "an outcome is realized"?--are they intended to be the same, or somehow related? And the term "outcome j" pops up at the end without previously being referred to as such.
The sentence "The model is used in several applications such as marketing and machine learning." makes it look very much narrower in its areas of use than is true -- the lede is better in that regard.
Also, the section heading "Introduction" is not a good idea since the lede at the top of the page is supposed to be the introduction.
In your reason for the citation-needed tag you say
I don't understand -- if you use it predict the several possible outcomes based on "attributes", which I interpret to mean explanatory variables, then this is a regression. Can you explain it here?
Thanks for contributing to Wikipedia -- I see that it is your first edit -- and I hope my comments will help you revise your edit. Duoduoduo ( talk) 21:05, 28 November 2011 (UTC)
An IP recently deleted the "assumptions" and "estimation of the intercept" sections. I strongly disagree that this makes the article better, so I undid this. Any objections? 018 ( talk) 14:45, 27 December 2011 (UTC)
Some equations in the section Multinomial logistic regression#As a set of independent binary regressions have some parameters which as far as I can see are not defined, and it appears to me that they ought to be just . Any objection to my removing the prime signs? Duoduoduo ( talk) 17:54, 5 February 2013 (UTC)
Perhaps I'm missing something obvious, but the section "As a set of independent binary regressions" starts by talking about doing K-1 regressions and then presents formulas for the log probability ratios such as Yi=1 / Yi=K. If we were to run this as, say, a logistic regression, what would the dependent variable be? 1 if the outcome is 1 and 0 if the outcome is K, but what about Yi=2 to K-1? — Preceding unsigned comment added by Jetopal ( talk • contribs) 01:10, 26 January 2014 (UTC)
The stated function is not actually the softmax – it's missing x_i in the numerator – and so it approximates the indicator function 1(x_j = max_i x_i) rather than max_i x_i, as is stated in the article. — Preceding unsigned comment added by 162.129.251.86 ( talk • contribs)
@ Loraof: I saw you just added this page to Category:Categorical data, but I'm not sure if that's appropriate. The only thing necessarily categorical in LR are the class labels (the outcomes/dependent variables), but that is true for any classification model. The inputs (independent variables/predictors/features) are real values. Perhaps Category:Classification algorithms should be a subcat of Category:Categorical data? QVVERTYVS ( hm?) 18:00, 26 May 2015 (UTC)
Why is there no worked example?
Mark W. Miller ( talk) 11:36, 28 November 2015 (UTC)
In the introduction, it says that multinomial logistic regression is a solution to a classification problem. This is correct, but not complete. As a regression method, it is also used to find out how the independent variables are related to the dependent variable, in this case by getting odds ratios. Sometimes classification is not a goal at all.
Where should that go? — Preceding unsigned comment added by PeterLFlomPhD ( talk • contribs) 12:51, 24 August 2017 (UTC)
This page is in desperate need of citations. There are a few in the introduction, but most sections present the material with no references at all. 66.71.17.135 ( talk) 17:18, 7 September 2018 (UTC)