![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||
|
It seems that this page deals primarily with two related topics:
1 - statistical methods for performing classification (e.g. linear regression)
2 - statistical methods for training machine learning algorithms to perform classification.
This is not nearly the entirety of the involvement of statistics in classification. For example, neither of these concerns is relevant to a pregnancy test, and yet a pregnancy test is developed with statistical methods (e.g. sensitivity and specificity).
So it seems to me a better title and focus for the page would be "Algorithmic classification" or "Computer classification".
Now, there is a nod to the more general role of statistics in the evaluation section. But this generality is not reflected in the rest of the page, except in the title and the very first line (which I therefore find misleading). So I would suggest moving the evaluation section to the Classification page, or starting a new page that links all the stuff scattered around Wikipedia on this topic. The Classification page is in very poor shape at the moment but it does address the generality of classification and is the appropriate place for the basic concept.
Willbown ( talk) 10:50, 11 June 2024 (UTC)
Hi, I recently added some new information regarding the coparison of various classification techniques with a reference to a peer reviewed article. There seems to be some controversy on this subject, the link has been removed several times. I am currently doing my PhD on this topic an I know the information is very relevant.
Is the reference and the external link http://www.pattenrecognition.co.za suitable for this site? If not, what can I do so that this information is not repeatedly removed?
cvdwalt —The preceding unsigned comment was added by 155.232.128.10 ( talk) 07:03, 9 March 2007 (UTC).
Wikipedia is not a platform for shameless self-promotion. Adding your own paper on this topic is clearly a conflict of interest (see this: http://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest).
Your paper has been removed from the Pattern recognition article (again, because of self-promotion). See here: http://en.wikipedia.org/wiki/Talk:Pattern_recognition#Link_to_peer_reviewed_paper . So you just added it to this article.
Furthermore, the journal that the paper was published in is fairly obscure and unnotable. If you really want to cite a paper on this subject, cite one from the topline journals (such as JMLR). —Preceding unsigned comment added by 97.107.142.93 ( talk) 13:07, 2 October 2010 (UTC)
Regarding Statistical classification/temp, I'm puzzled how to integrate it into existing articles. Check out Classification: there are two types of classification, Taxonomic classification and Statistical classification. I think you may be talking about taxonomic classification, but I'm not sure. We've made a distinction: taxonomic classification is based on human decision-making, while statistical classification is based on algorithmic decision-making.
-- hike395 July 1, 2005 17:55 (UTC)
I checked out the existing pages again. I'm talking about algorithmic decision making in the temp article--the classification of items into groups based on numerical/statistical analysis using some algorithm. My issue is that under the category of algorithmic decision making, the topic is discussed only in terms of pattern recognition/machine learning and there is no general explanation of what statistical classification is and does. Algorithmic, (or computational, or numerical or statistical--I use them synonymously) classification can be, and is, applied to all kinds of things. So I think an overall summary of what stat. classif. is and does--how it works, it's underlying ideas, types of approaches, specific algorithms etc., is needed. Particular applications can then be discussed after that. To jump to a pattern recognition application immediately is just getting too specific too fast on just one of many many applications.
Also, the applications listed under taxonomic classification are not necessarily based on human decision making. Some of them can be algorithmic as well, such as phenetics-based classifications of organisms.
I don't disagree with your idea to break the topic into human vs algorithmic-based procedures, but I think some work needs to be done to make everything clearer. What I wrote needs to be expanded on for sure, but is a basic intro which can be built on I hope.
Jeeb 2 July 2005 00:18 (UTC)
The opening sentence is:
"Statistical classification is a supervised machine learning procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items."
To my way of thinking, this fails to distinguish between two kinds of properties of statistical classification. First and most important is what the 'goal of statistical classification is. Second, and less important, is by what means this goal is achieved.
By mixing together a) the goal and b) how it is achieved, this definition of the article's subject is more confusing than helpful to anyone trying to learn what statistical classification is. Daqu ( talk) 09:06, 16 February 2010 (UTC)
This article has serious problems. The biggest issue is that it's confusing statistical classification with classification in general. There are plenty of non-statistical classification techniques, such as decision trees and support vector machines, some of which are described or referenced in this article. I'm getting this article renamed to classification (machine learning) and when this is done I will rewrite the article to fix these problems. Benwing ( talk) 03:24, 5 October 2010 (UTC)
In view of the attempted hijacking of the previously existing topic of this article, I have replaced the content with something more appropriate, with the intention that it should be the "main article" for the existing category Category:Statistical classification. The major part of the version created by Benwing is at classification in machine learning. This restoress the situation similar to that existing before the undiscussed renaming by Benwing. The opinions added by Benwing, and now in classification in machine learning, still have no sources backing them up. The slightly changed name is to allow for the working of automatic procedures for replacing multiple redirects. Melcombe ( talk) 12:48, 23 December 2010 (UTC)
How can Fischer's linear discriminant be a frequentist method if he assumes that the data has a multivariate normal distribution? Doesnt assuming a distribution make the method Bayesian? Mrdthree ( talk) 00:11, 24 May 2011 (UTC)
The result of the move request was: page moved per request. - GTBacchus( talk) 14:39, 23 June 2011 (UTC)
Statistical classification (machine learning) →
Statistical classification – There is an existing article
Classification in machine learning that covers the topic of application in machine learning. The name
statistical classification is used as the main article for
Category:Statistical classification and this is the role this article had before its un-discussed name change. Meanwhile the wikilink
statistical classification redirects to a poor and inappropriate article. The proposal is that this articlw should replace the redirect, restoring the previous status.
Melcombe (
talk)
10:58, 15 June 2011 (UTC)
Chaosdruid ( talk) 12:23, 28 February 2012 (UTC)
As I see this topic has been already brought up in the past but no action was ever taken. The fact is that the title of this article is problematic as well as inconsistent with mainstream usage in literature. Statistical classification can only refer to statistical learning, which is only a subfield of classification theory in general as there are many classification techniques (e.g. KNN) that have little or nothing to do with statistics. In addition, there is already an article on Statistical learning theory so that topic is fully cover. When the vote against the move was made, there used to be an article called Classification in machine learning but it appears that the article was later deleted and redirected here. I strongly believe that the current article should be renamed to something like Data classification or Classification (data analysis) or Classification (machine learning) or something similar. Delafé ( talk) 10:36, 13 January 2013 (UTC)
A search [2] on google books demonstrates there is no consensus on this terminology. Therefore, I intend to rename the two articles to Supervised classification and clustering and Unsupervised classification and clustering. Fgnievinski ( talk) 23:37, 3 May 2014 (UTC)
The hatnote should be maintained, because "(unsupervised) classification" is actually an old synonym among statisticians for what is now more often called clustering (at least in the ML community). QVVERTYVS ( hm?) 14:54, 5 May 2014 (UTC)
Also, I have trouble with "statistical classification", as it implies a distinction with "non-statistical classification", which is nonsensical; I think it should be called "automatic classification". Fgnievinski ( talk) 15:17, 5 May 2014 (UTC)
And the following section needs to be brought in alignment with the agreements above: Pattern_recognition#Which_classifier_to_choose_for_a_classification_task.3F; e.g., it starts with "This article contains an extensive list of statistical classifiers for supervised and unsupervised classification tasks, clustering and general regression prediction." (shivers) Fgnievinski ( talk) 15:17, 5 May 2014 (UTC)
A plethora of classification performance indicators have been proposed in the scientific literature. Nobody can say that one is better than another; that depends on what people do with the indicators. In this article, there is only a discussion on the uncertainty coefficient and its advantage over simple accuracy. Why is this particular indicator put in evidence ?
Also, the advantage that is given is clearly a mistake, even if it is written in the cited paper. The uncertainty coefficient cannot be insensitive to the relative sizes of the different classes. If we follow the link to the page devoted to the uncertainty coefficient, we see that it depends on H(x) which depends on P(x). But P(x) is the prior probabilities, or in other words, the relative sizes of the different classes. — Preceding unsigned comment added by 149.154.235.63 ( talk) 01:30, 24 December 2017 (UTC)
Is there a distinction from machine learning in that SC is about classification into a small or at least finite number of categories/classes whereas ML might output a set/vector? of reals ? - Rod57 ( talk) 19:08, 21 March 2018 (UTC)
This article has been targeted by an (apparent) campaign to insert "Decision Stream" into various Wikipedia pages about machine learning. "Decision Stream" refers to a recently published paper that currently has zero academic citations. [1] The number of articles that have been specifically edited to include "Decision Stream" within the last couple of months suggests conflict-of-interest editing by someone who wants to advertise this paper. They are monitoring these pages and quickly reverting any edits to remove this content.
Known articles targeted:
BustYourMyth ( talk) 19:16, 26 July 2018 (UTC)
Dear BustYourMyth,
Your activity is quite suspiciase: registration of the user just to delete the mention of the one popular article. Peaple from different contries with the positive hystory of Wikipedia improvement are taking part in removing of your commits as well as in providing information about "Decision Stream".
Kind regards, Dave — Preceding unsigned comment added by 62.119.167.36 ( talk) 13:35, 27 July 2018 (UTC)
I asked for partial protection at WP:ANI North8000 ( talk) 17:06, 27 July 2018 (UTC)
References
The short description
{{short description|Term in statistical classification}}
suffers from "infinite recursion"; perhaps we can choose a better description. Thatsme314 ( talk) 18:08, 7 June 2023 (UTC)
I propose merging into Pattern recognition. There is nothing distinct about the two topics, they are just synonyms DMH43 ( talk) 01:38, 12 December 2023 (UTC)
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||
|
It seems that this page deals primarily with two related topics:
1 - statistical methods for performing classification (e.g. linear regression)
2 - statistical methods for training machine learning algorithms to perform classification.
This is not nearly the entirety of the involvement of statistics in classification. For example, neither of these concerns is relevant to a pregnancy test, and yet a pregnancy test is developed with statistical methods (e.g. sensitivity and specificity).
So it seems to me a better title and focus for the page would be "Algorithmic classification" or "Computer classification".
Now, there is a nod to the more general role of statistics in the evaluation section. But this generality is not reflected in the rest of the page, except in the title and the very first line (which I therefore find misleading). So I would suggest moving the evaluation section to the Classification page, or starting a new page that links all the stuff scattered around Wikipedia on this topic. The Classification page is in very poor shape at the moment but it does address the generality of classification and is the appropriate place for the basic concept.
Willbown ( talk) 10:50, 11 June 2024 (UTC)
Hi, I recently added some new information regarding the coparison of various classification techniques with a reference to a peer reviewed article. There seems to be some controversy on this subject, the link has been removed several times. I am currently doing my PhD on this topic an I know the information is very relevant.
Is the reference and the external link http://www.pattenrecognition.co.za suitable for this site? If not, what can I do so that this information is not repeatedly removed?
cvdwalt —The preceding unsigned comment was added by 155.232.128.10 ( talk) 07:03, 9 March 2007 (UTC).
Wikipedia is not a platform for shameless self-promotion. Adding your own paper on this topic is clearly a conflict of interest (see this: http://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest).
Your paper has been removed from the Pattern recognition article (again, because of self-promotion). See here: http://en.wikipedia.org/wiki/Talk:Pattern_recognition#Link_to_peer_reviewed_paper . So you just added it to this article.
Furthermore, the journal that the paper was published in is fairly obscure and unnotable. If you really want to cite a paper on this subject, cite one from the topline journals (such as JMLR). —Preceding unsigned comment added by 97.107.142.93 ( talk) 13:07, 2 October 2010 (UTC)
Regarding Statistical classification/temp, I'm puzzled how to integrate it into existing articles. Check out Classification: there are two types of classification, Taxonomic classification and Statistical classification. I think you may be talking about taxonomic classification, but I'm not sure. We've made a distinction: taxonomic classification is based on human decision-making, while statistical classification is based on algorithmic decision-making.
-- hike395 July 1, 2005 17:55 (UTC)
I checked out the existing pages again. I'm talking about algorithmic decision making in the temp article--the classification of items into groups based on numerical/statistical analysis using some algorithm. My issue is that under the category of algorithmic decision making, the topic is discussed only in terms of pattern recognition/machine learning and there is no general explanation of what statistical classification is and does. Algorithmic, (or computational, or numerical or statistical--I use them synonymously) classification can be, and is, applied to all kinds of things. So I think an overall summary of what stat. classif. is and does--how it works, it's underlying ideas, types of approaches, specific algorithms etc., is needed. Particular applications can then be discussed after that. To jump to a pattern recognition application immediately is just getting too specific too fast on just one of many many applications.
Also, the applications listed under taxonomic classification are not necessarily based on human decision making. Some of them can be algorithmic as well, such as phenetics-based classifications of organisms.
I don't disagree with your idea to break the topic into human vs algorithmic-based procedures, but I think some work needs to be done to make everything clearer. What I wrote needs to be expanded on for sure, but is a basic intro which can be built on I hope.
Jeeb 2 July 2005 00:18 (UTC)
The opening sentence is:
"Statistical classification is a supervised machine learning procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items."
To my way of thinking, this fails to distinguish between two kinds of properties of statistical classification. First and most important is what the 'goal of statistical classification is. Second, and less important, is by what means this goal is achieved.
By mixing together a) the goal and b) how it is achieved, this definition of the article's subject is more confusing than helpful to anyone trying to learn what statistical classification is. Daqu ( talk) 09:06, 16 February 2010 (UTC)
This article has serious problems. The biggest issue is that it's confusing statistical classification with classification in general. There are plenty of non-statistical classification techniques, such as decision trees and support vector machines, some of which are described or referenced in this article. I'm getting this article renamed to classification (machine learning) and when this is done I will rewrite the article to fix these problems. Benwing ( talk) 03:24, 5 October 2010 (UTC)
In view of the attempted hijacking of the previously existing topic of this article, I have replaced the content with something more appropriate, with the intention that it should be the "main article" for the existing category Category:Statistical classification. The major part of the version created by Benwing is at classification in machine learning. This restoress the situation similar to that existing before the undiscussed renaming by Benwing. The opinions added by Benwing, and now in classification in machine learning, still have no sources backing them up. The slightly changed name is to allow for the working of automatic procedures for replacing multiple redirects. Melcombe ( talk) 12:48, 23 December 2010 (UTC)
How can Fischer's linear discriminant be a frequentist method if he assumes that the data has a multivariate normal distribution? Doesnt assuming a distribution make the method Bayesian? Mrdthree ( talk) 00:11, 24 May 2011 (UTC)
The result of the move request was: page moved per request. - GTBacchus( talk) 14:39, 23 June 2011 (UTC)
Statistical classification (machine learning) →
Statistical classification – There is an existing article
Classification in machine learning that covers the topic of application in machine learning. The name
statistical classification is used as the main article for
Category:Statistical classification and this is the role this article had before its un-discussed name change. Meanwhile the wikilink
statistical classification redirects to a poor and inappropriate article. The proposal is that this articlw should replace the redirect, restoring the previous status.
Melcombe (
talk)
10:58, 15 June 2011 (UTC)
Chaosdruid ( talk) 12:23, 28 February 2012 (UTC)
As I see this topic has been already brought up in the past but no action was ever taken. The fact is that the title of this article is problematic as well as inconsistent with mainstream usage in literature. Statistical classification can only refer to statistical learning, which is only a subfield of classification theory in general as there are many classification techniques (e.g. KNN) that have little or nothing to do with statistics. In addition, there is already an article on Statistical learning theory so that topic is fully cover. When the vote against the move was made, there used to be an article called Classification in machine learning but it appears that the article was later deleted and redirected here. I strongly believe that the current article should be renamed to something like Data classification or Classification (data analysis) or Classification (machine learning) or something similar. Delafé ( talk) 10:36, 13 January 2013 (UTC)
A search [2] on google books demonstrates there is no consensus on this terminology. Therefore, I intend to rename the two articles to Supervised classification and clustering and Unsupervised classification and clustering. Fgnievinski ( talk) 23:37, 3 May 2014 (UTC)
The hatnote should be maintained, because "(unsupervised) classification" is actually an old synonym among statisticians for what is now more often called clustering (at least in the ML community). QVVERTYVS ( hm?) 14:54, 5 May 2014 (UTC)
Also, I have trouble with "statistical classification", as it implies a distinction with "non-statistical classification", which is nonsensical; I think it should be called "automatic classification". Fgnievinski ( talk) 15:17, 5 May 2014 (UTC)
And the following section needs to be brought in alignment with the agreements above: Pattern_recognition#Which_classifier_to_choose_for_a_classification_task.3F; e.g., it starts with "This article contains an extensive list of statistical classifiers for supervised and unsupervised classification tasks, clustering and general regression prediction." (shivers) Fgnievinski ( talk) 15:17, 5 May 2014 (UTC)
A plethora of classification performance indicators have been proposed in the scientific literature. Nobody can say that one is better than another; that depends on what people do with the indicators. In this article, there is only a discussion on the uncertainty coefficient and its advantage over simple accuracy. Why is this particular indicator put in evidence ?
Also, the advantage that is given is clearly a mistake, even if it is written in the cited paper. The uncertainty coefficient cannot be insensitive to the relative sizes of the different classes. If we follow the link to the page devoted to the uncertainty coefficient, we see that it depends on H(x) which depends on P(x). But P(x) is the prior probabilities, or in other words, the relative sizes of the different classes. — Preceding unsigned comment added by 149.154.235.63 ( talk) 01:30, 24 December 2017 (UTC)
Is there a distinction from machine learning in that SC is about classification into a small or at least finite number of categories/classes whereas ML might output a set/vector? of reals ? - Rod57 ( talk) 19:08, 21 March 2018 (UTC)
This article has been targeted by an (apparent) campaign to insert "Decision Stream" into various Wikipedia pages about machine learning. "Decision Stream" refers to a recently published paper that currently has zero academic citations. [1] The number of articles that have been specifically edited to include "Decision Stream" within the last couple of months suggests conflict-of-interest editing by someone who wants to advertise this paper. They are monitoring these pages and quickly reverting any edits to remove this content.
Known articles targeted:
BustYourMyth ( talk) 19:16, 26 July 2018 (UTC)
Dear BustYourMyth,
Your activity is quite suspiciase: registration of the user just to delete the mention of the one popular article. Peaple from different contries with the positive hystory of Wikipedia improvement are taking part in removing of your commits as well as in providing information about "Decision Stream".
Kind regards, Dave — Preceding unsigned comment added by 62.119.167.36 ( talk) 13:35, 27 July 2018 (UTC)
I asked for partial protection at WP:ANI North8000 ( talk) 17:06, 27 July 2018 (UTC)
References
The short description
{{short description|Term in statistical classification}}
suffers from "infinite recursion"; perhaps we can choose a better description. Thatsme314 ( talk) 18:08, 7 June 2023 (UTC)
I propose merging into Pattern recognition. There is nothing distinct about the two topics, they are just synonyms DMH43 ( talk) 01:38, 12 December 2023 (UTC)