This is an
essay on
Wikipedia categorization. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of
Wikipedia's policies or guidelines, as it has not been
thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
This page in a nutshell: Categorization is good if used sparingly and consistently, but can be bad if overused and inconsistent. |
Categorization, if used correctly and consistently, may provide a useful facility for some Wikipedia editors and for some readers. However, placing articles in too many categories (overcategorization) can have negative consequences.
Ideally, the amount of categorization (i.e. the number of categories that each article is placed in) should be sufficient that the benefits of categorization can be realised whilst minimising the costs.
For most readers and for many editors categorization is an irrelevance. [1]
Uses of categorization:
Costs of categorization include:
A Wikipedia article can contain hundreds, even thousands, of pieces of information - e.g. an article about a city may mention the city's opera house, football team etc. In theory, each of these could be a characteristic to categorise by (e.g. we could have a category for articles about "Cities that have had an openly gay mayor" [14]). [15] However, that sort of categorization could cause articles to be in hundreds of categories and require a huge amount of maintenance (on both the articles and the huge category trees that would result) [16]. Instead, Wikipedia categorization [17] is based on categorizing articles only by the most important [18] characteristics of the topic of the article (plus a few categories required for administrative reasons). In Wikipedia these are called " defining characteristics". The exact meaning of that term will probably never be agreed by all editors, but the principle is generally accepted. So, for example, the article about a city is normally in a category like "Cities in <country>" and a few other categories for important long-term characteristics (like being a capital city or being on the coast); of the hundreds of facts in the article only a small number are used for categorization.
Editors interested in a particular topic tend (perhaps inevitably) to view characteristics that relate to that topic as being of particular importance. For example, an editor interested in time zones [19] may think that's an important characteristic of a city. Other editors might be more interested in the types of public transport a city has, the ethnicity of its inhabitants, sporting events held in the city etc (these are all examples of categories that have been deleted). [20] Some editors even place articles in a category despite the articles not mentioning the characteristic that the category is about. [21] [22] [23] If all these editors got their way then a link to "their" category would appear at the bottom of lots of articles, but it would be hidden amongst hundreds of other categories and thus unlikely to be used to navigate from the article. [24]
Sometimes editors start from a (off-wiki) list and try to add all the corresponding Wikipedia articles to a category, regardless of whether the article's contents show it meeting the inclusion criteria for the category. Some examples:
Similarly, editors adding articles to a category based on an off-wiki list may miss articles that meet the inclusion criteria, but aren't on their list. An example:
Some examples of overcategorization:
Consider, for example, an article about a singer. An editor interested in awards might look at the part of the article listing awards the person has received, create categories such as "Winners of <prize>" and place the article in those categories. Someone interested in festivals might place the article in categories for "People who performed at <festival>", someone interested in personal lives might add category tags for "People who have dated <person>".... The list of categories would then be as long as the article - in fact it could be much longer as there can be categories for combinations of characteristics; so the article might be in categories for both "Singers who performed at <festival>" and "People from <city> who performed at <festival>".
A common problem in Wikipedia is that wherever there's a list which doesn't have a precise definition of what is eligible to be in the list then (well-meaning) editors keep adding "just one more" item to the list. [31] This happens both with new categories [32] and with category tags on articles [33].
Categories [34] are for grouping articles about similar topics; that is not quite the same thing as grouping articles about related topics. For example, an article about a soldier who was awarded a medal for his actions in a particular battle should be linked to articles about related topics (e.g. the article about the battle, his regiment, weapons used etc), but in categorization his article should be grouped with articles about similar topics (i.e. other soldiers decorated for valour) even though there are few direct links between such articles.
Another example: Charles Darwin and HMS Beagle are related topics - the articles are linked to each other, but in categorization one belongs under people categories (e.g. Category:English naturalists) and the other belongs under ship categories (e.g. Category:Ships of the Royal Navy). In this case categorizing articles because they are related can lead to a category loop ( Category:Charles Darwin and Category:HMS Beagle). [35]
Possible solutions to some/all of the problems outlined in this essay include:
(Under construction)
Cost-benefit analysis for (particular types of) categories -
I've made some additions. I thought that some of the little-known places in the Kirklees district needed a reference, so here is a table from the 2001 census http://www.kirklees.gov.uk/community/statistics/census-by-settlement/KS06settle2003.xls This includes Batley Carr, Lockwood, Mount Pleasant, Ravensthorpe and Thornton Lodge. Some might fret that the figures for Ravensthorpe shows 49.5% White population, so Whites are only just in a minority, but I am more than confident that the large numbers of Kurds who have settled in Ravensthorpe since 2003 would alter the figures for now so that there would be a clear ethnic minority-majority in Ravensthorpe. [Editor] (talk) 18:22, 27 November 2008 (UTC)
This is an
essay on
Wikipedia categorization. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of
Wikipedia's policies or guidelines, as it has not been
thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
This page in a nutshell: Categorization is good if used sparingly and consistently, but can be bad if overused and inconsistent. |
Categorization, if used correctly and consistently, may provide a useful facility for some Wikipedia editors and for some readers. However, placing articles in too many categories (overcategorization) can have negative consequences.
Ideally, the amount of categorization (i.e. the number of categories that each article is placed in) should be sufficient that the benefits of categorization can be realised whilst minimising the costs.
For most readers and for many editors categorization is an irrelevance. [1]
Uses of categorization:
Costs of categorization include:
A Wikipedia article can contain hundreds, even thousands, of pieces of information - e.g. an article about a city may mention the city's opera house, football team etc. In theory, each of these could be a characteristic to categorise by (e.g. we could have a category for articles about "Cities that have had an openly gay mayor" [14]). [15] However, that sort of categorization could cause articles to be in hundreds of categories and require a huge amount of maintenance (on both the articles and the huge category trees that would result) [16]. Instead, Wikipedia categorization [17] is based on categorizing articles only by the most important [18] characteristics of the topic of the article (plus a few categories required for administrative reasons). In Wikipedia these are called " defining characteristics". The exact meaning of that term will probably never be agreed by all editors, but the principle is generally accepted. So, for example, the article about a city is normally in a category like "Cities in <country>" and a few other categories for important long-term characteristics (like being a capital city or being on the coast); of the hundreds of facts in the article only a small number are used for categorization.
Editors interested in a particular topic tend (perhaps inevitably) to view characteristics that relate to that topic as being of particular importance. For example, an editor interested in time zones [19] may think that's an important characteristic of a city. Other editors might be more interested in the types of public transport a city has, the ethnicity of its inhabitants, sporting events held in the city etc (these are all examples of categories that have been deleted). [20] Some editors even place articles in a category despite the articles not mentioning the characteristic that the category is about. [21] [22] [23] If all these editors got their way then a link to "their" category would appear at the bottom of lots of articles, but it would be hidden amongst hundreds of other categories and thus unlikely to be used to navigate from the article. [24]
Sometimes editors start from a (off-wiki) list and try to add all the corresponding Wikipedia articles to a category, regardless of whether the article's contents show it meeting the inclusion criteria for the category. Some examples:
Similarly, editors adding articles to a category based on an off-wiki list may miss articles that meet the inclusion criteria, but aren't on their list. An example:
Some examples of overcategorization:
Consider, for example, an article about a singer. An editor interested in awards might look at the part of the article listing awards the person has received, create categories such as "Winners of <prize>" and place the article in those categories. Someone interested in festivals might place the article in categories for "People who performed at <festival>", someone interested in personal lives might add category tags for "People who have dated <person>".... The list of categories would then be as long as the article - in fact it could be much longer as there can be categories for combinations of characteristics; so the article might be in categories for both "Singers who performed at <festival>" and "People from <city> who performed at <festival>".
A common problem in Wikipedia is that wherever there's a list which doesn't have a precise definition of what is eligible to be in the list then (well-meaning) editors keep adding "just one more" item to the list. [31] This happens both with new categories [32] and with category tags on articles [33].
Categories [34] are for grouping articles about similar topics; that is not quite the same thing as grouping articles about related topics. For example, an article about a soldier who was awarded a medal for his actions in a particular battle should be linked to articles about related topics (e.g. the article about the battle, his regiment, weapons used etc), but in categorization his article should be grouped with articles about similar topics (i.e. other soldiers decorated for valour) even though there are few direct links between such articles.
Another example: Charles Darwin and HMS Beagle are related topics - the articles are linked to each other, but in categorization one belongs under people categories (e.g. Category:English naturalists) and the other belongs under ship categories (e.g. Category:Ships of the Royal Navy). In this case categorizing articles because they are related can lead to a category loop ( Category:Charles Darwin and Category:HMS Beagle). [35]
Possible solutions to some/all of the problems outlined in this essay include:
(Under construction)
Cost-benefit analysis for (particular types of) categories -
I've made some additions. I thought that some of the little-known places in the Kirklees district needed a reference, so here is a table from the 2001 census http://www.kirklees.gov.uk/community/statistics/census-by-settlement/KS06settle2003.xls This includes Batley Carr, Lockwood, Mount Pleasant, Ravensthorpe and Thornton Lodge. Some might fret that the figures for Ravensthorpe shows 49.5% White population, so Whites are only just in a minority, but I am more than confident that the large numbers of Kurds who have settled in Ravensthorpe since 2003 would alter the figures for now so that there would be a clear ethnic minority-majority in Ravensthorpe. [Editor] (talk) 18:22, 27 November 2008 (UTC)