This is an
essay. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of
Wikipedia's policies or guidelines, as it has not been
thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
This is a mini-essay on a problem in MediaWikiland: category policy. It was initially discussed at wikiEN-l.
N.B. I'm aware of previous discussions at Wikipedia talk:Categorization. This essay is a more thorough and defensible treatment of the issue, and it highlights the fallacies of many previous arguments.
Update 4 Jun 2005: The rule proposed here, "categories are sets/graphs, not trees", is now enshrined in policy =).
What is a category? No-one knows. There isn't consensus on what a category is (see Wikipedia talk:Categorization). Is it a hierarchical tree, with all categorizations representing " is a" relationships? Or is it just a set, a group of related articles, which may belong inside one or more other sets?
This is an important question - just look at Wikipedia:Categories for deletion. Changes to categories have more widespread effects than changes to articles, and have a greater possibly of annoying editors.
I believe that categories are, and should be, sets, not hierarchies.
What was the original purpose of the categorization system? Development of a taxonomy of worldy knowledge? I don't think the developers are really that stupid (I'll expand on this below). AFAIK it was as a kind of automatic list-generator for related articles. Lists are sets, not hierarchies. Lists of "related articles" are sets, not hierarchies.
The way that categories have been developed in software supports the idea that categories are sets. There is implicit support for categories as sets because there is nothing to stop anyone from using them that way. None of the limits of a hierarchical system exist in the category software. Such software is the best way to enforce the idea of hierarchical categories, and would be easy to implement (eg. don't allow arbitrary parenting of categories).
Until policy is decided on (and, preferably, software upgraded to support it), categories will continue to be used as sets. Since sets include hierarchies, while hierarchies don't include sets, the current categorization system is one of sets.
A categorization system is a worldview. Therefore it is very hard for categories to be NPOV. The following quote from Clay Shirky expands:
The whole concept of an all-encompassing hierarchical category system is against the spirit of Wikipedia. It is an all-encompassing worldview, or attribution of value, to the marked-up (categorized) articles.
The "categories are hierarchies" idea presumes that it is even possible for a large group of people to agree on an all-encompassing belief-system, a ridiculous notion totally bereft of realism, a notion that has been shown wrong experientially in many IT metadata projects.
Categories, especially hierarchical categories, are about the followers of one particular worldview implicitly saying "our way is right, everyone should follow it". Note that the proportion of people who follow one particular worldview in every aspect is very small.
Categorization by set is obviously less POV. An article can belong to as many sets as the community thinks it should belong to, whether directly or via multiple parenthood of the article's category (or ancestors).
The benefits of hierarchical categorization
are outweighed by its costs
This essay assumes that sets are taken advantage of fully by allowing multiple inheritance and possibly even inheritance loops, and encouraging articles and categories to be given many categories rather than just one or two.
This is an
essay. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of
Wikipedia's policies or guidelines, as it has not been
thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
This is a mini-essay on a problem in MediaWikiland: category policy. It was initially discussed at wikiEN-l.
N.B. I'm aware of previous discussions at Wikipedia talk:Categorization. This essay is a more thorough and defensible treatment of the issue, and it highlights the fallacies of many previous arguments.
Update 4 Jun 2005: The rule proposed here, "categories are sets/graphs, not trees", is now enshrined in policy =).
What is a category? No-one knows. There isn't consensus on what a category is (see Wikipedia talk:Categorization). Is it a hierarchical tree, with all categorizations representing " is a" relationships? Or is it just a set, a group of related articles, which may belong inside one or more other sets?
This is an important question - just look at Wikipedia:Categories for deletion. Changes to categories have more widespread effects than changes to articles, and have a greater possibly of annoying editors.
I believe that categories are, and should be, sets, not hierarchies.
What was the original purpose of the categorization system? Development of a taxonomy of worldy knowledge? I don't think the developers are really that stupid (I'll expand on this below). AFAIK it was as a kind of automatic list-generator for related articles. Lists are sets, not hierarchies. Lists of "related articles" are sets, not hierarchies.
The way that categories have been developed in software supports the idea that categories are sets. There is implicit support for categories as sets because there is nothing to stop anyone from using them that way. None of the limits of a hierarchical system exist in the category software. Such software is the best way to enforce the idea of hierarchical categories, and would be easy to implement (eg. don't allow arbitrary parenting of categories).
Until policy is decided on (and, preferably, software upgraded to support it), categories will continue to be used as sets. Since sets include hierarchies, while hierarchies don't include sets, the current categorization system is one of sets.
A categorization system is a worldview. Therefore it is very hard for categories to be NPOV. The following quote from Clay Shirky expands:
The whole concept of an all-encompassing hierarchical category system is against the spirit of Wikipedia. It is an all-encompassing worldview, or attribution of value, to the marked-up (categorized) articles.
The "categories are hierarchies" idea presumes that it is even possible for a large group of people to agree on an all-encompassing belief-system, a ridiculous notion totally bereft of realism, a notion that has been shown wrong experientially in many IT metadata projects.
Categories, especially hierarchical categories, are about the followers of one particular worldview implicitly saying "our way is right, everyone should follow it". Note that the proportion of people who follow one particular worldview in every aspect is very small.
Categorization by set is obviously less POV. An article can belong to as many sets as the community thinks it should belong to, whether directly or via multiple parenthood of the article's category (or ancestors).
The benefits of hierarchical categorization
are outweighed by its costs
This essay assumes that sets are taken advantage of fully by allowing multiple inheritance and possibly even inheritance loops, and encouraging articles and categories to be given many categories rather than just one or two.