This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 |
I hit much the same problem as Morwen a week or so ago, and backed off from categorization while waiting for the dust to settle. I saw Category:Football (soccer), and Category:Sportspeople, and thought, well, why not Category:Footballers (soccer) with both as supercategories? I decided against it, because a footballer is a sportsperson, but is not a sport. I've come to the conclusion that Categories are currently broken in one serious (but fixable) way: we've got a fantastic set of directed graphs, but none of the arrows is labelled. In other words, we're saying X is related to Y, but if you ask how, I'm going to have to kill you. This makes any kind of semantic inference across those relations impossible (apologies if I'm abusing terminology here but you get my point): we can't say Anyone in a subcategory of People (recursively) is also implicitly a member of People because that's making a huge semantic assumption about the relations, and one that just isn't true in the current state of the wiki.
The fix is to label the arrows: describe the relations. This is, in my limited understanding, what RDF does. That uses the terms subject, predicate, and object. The subject is the thing you're categorizing. The object is the category you're adding it to. And the predicate describes the relation. Predicates allow you to make semantic inferences programmatically. So far in the wiki I've seen two predicates, which I would summarise as Is an example of (John Lennon is an example of a vocalist) and Is, er, related in some way to (Musical groups are, er, related in some way to Music; 251 Menlove Avenue is, er, related in some way to John Lennon). Unless we encode this distinction in the categorization system, we can't make any inferences over the relations. And once we start encoding this distinction a whole new world of possibilities (and problems!) springs up: arbitrary relations.
I want to relate footballers to football. I should be able to use a more specific relation than "is, er, related in some way to", something like "is a participant in". It's the same relation you'd use between basketball players and basketball, but not the same as you'd use between Musical groups and Music, and especially not between 251 Menlove Avenue and John Lennon. In Category:Footballers (soccer) I want to be able to put something like:
[[Relation:Participates (sport)|Category:Football (soccer)]] [[Relation:Example of|Category:Sportspeople]]
You can see the idea. Wikipedia then encodes much more information programmatically. Incidentally, this could also be used with the tricky problem of who belongs in Category:Terrorists - you could use different relations, such as "Alleged to be" (for example!). It also, of course, means that we have a whole new "Relation" namespace to deal with - with RfD, etc, etc.
IMHO without an idea like this we can't make any meaningful semantic inferences whatsoever within the category hierarchies, and categories overall become much less useful.
(Oh, in case someone asks: yes, relations should also be related to other relations :-)
-- Avaragado 10:05, 9 Jun 2004 (UTC)
I am really confused by Category:EU countries. I notice that it contains various countries both as articles and as subcategories, i.e. Denmark and Category:Denmark. However, Category:Denmark, which is a subcat of EU countries contains Danish culture. Isn't this improper inheiritance? Wouldn't this imply that Danish culture is an EU country, or do I misunderstand the point of categorization? - DropDeadGorgias (talk) 20:41, Jun 4, 2004 (UTC)
(moved from main page)
I'm not sure that I can diagram this, so I'm not going to try. I'm thinking about some of the dog topics. For example, dog is a member of pets; dog is also a member of mammals; both mammals and pets are members of animals but neither is a subcategory of the other. Now, how about dog agility? It needs to go under the dog sports category, which needs to be under the dog category, because it's related to dogs. It also needs to go under the sports category, because it's a sport. It probably also needs to go under the hobby category. But dog and sports do not at any higher point in the hierarchy have a common parent; possibly hobby and sports might fall together again under leisure activities (?), but not all sports are hobbies and not all hobbies are sports. Just wanted to give another example of why something might need to be in multiple categories. Thoughts? Elf | Talk 02:51, 4 Jun 2004 (UTC)
At the beginning or at the end of articles (with interlanguage links) ?
It is apparently impossible to rename a Category article once it has been created. I am for example unable to move Catgeory:Australian MHRs (which is an incorrect form) to the correct Category:Australian federal MPs, so now I am going to have to delete them all. (This is User:Chuq's fault - I told him MHR was incorrect but he/she went ahead anyway.) Adam
This problem must be urgently fixed or it will cause endless fights. Adam 15:43, 5 Jun 2004 (UTC)
A most unsatifactory state of affairs. This whole scheme will cause as many problems as it solves, maybe more. Adam 17:52, 5 Jun 2004 (UTC)
Might it be possible to have a "minor category" (or "organizational category" or "invisible category" or whatever you want to call it)? Perhaps put a small "[see more categories]" link in the current Category box, with the link either revealing a hidden box below the current one, or taking you to separate subpage of the article with the full category list.... This way categories that are important to the reader would be visible in the current box, while those that are useful but redundant (or more useful to the editors/organizers than the readers) could be hidden from casual view, but easily reachable by those who are interested.
Even if the the minor category idea is unworkable, I suspect a [see more] functionality will be needed eventually, if certain articles end up placed in more than a small handful of categories.
What do you think? -- Catherine | talk 22:21, 5 Jun 2004 (UTC)
Ah, I understand. I agree - this would be really nice if we could do that. Any developers about? john k 04:06, 10 Jun 2004 (UTC)
I've seen several complaints about the free-form structure of the list of articles belonging to a category. Take a look at Category:Lists_of_battles, for instance. The top half is the neat tree found in the old-style list. Below in the box is an automatically generated unsorted jumble of articles that belong in the category. These are the solutions I see:
1.) Merge lists of articles into category pages, but manually maintain structured copies in the editable sections. This requires more work, but at least there's only one page (to help maintain consistency between the manual and automatic lists).
2.) Put some tags in the editable section of the category page that tell the system how to create a properly sorted nested list. This is possibly easier for editors, since all the changes they need to make are on a single page, but when new articles are added to the category, they will also need to be put into the sort order by futzing with the category page, or else end up in an unsorted section. This option also makes it really easy to re-factor.
3.) Put tags in the article themselves that allow the system to sort them in a nice way. A bit of syntax that might help might be to stick sort strings after the category name with a / as a separator. So for example:
The sort order would have to be defined somehow - it could be automatic, or specified in the category page. But it would be especially annoying to distribute the sort order among the article pages - that would make renumbering really annoying.
(Note from later on - / would be a bad choice of character, since this conflicts with the literal / in URLs.)
4.) A hybrid method, in which editors have a list-making interface similar to what is envisioned in option 2, but when the information is stored, the system goes back and writes tags as in option 3. Easily editable, easily renumbered, but all the category information is kept in one place. Harder to implement in code, though, and more resource-intensive.
-- Beland 10:12, 5 Jun 2004 (UTC)
5) For really long lists of stuff, it would be nice to simply break it up into alphabetized sections with letter headers. The software could be told to do this with a TOC tag of some sort, or it could do this automatically whenever a certain size threshold is reached. Another simple formatting option could include telling the software to put the links in a bulleted list instead of a comma-delimited string. Bryan 02:04, 6 Jun 2004 (UTC)
The rendering on category pages has changed to make subcategory and article listings multi-column. That's nice. But it has also added big, bold letters at the start of each letter in the alphabetical listings. This is a big waste of space and looks quite ugly when there is a small number of items. See for instance: Category:Main_page or [[:Category:Wikipedia]. I think it should be obvious when a list is alphabetical, and navigation in an alphabetical list is fairly trivial - you just go up or down as appropriate (even if it's wrapped into multiple columns).
I would urge disabling the letters, at least for lists with a small number of items (perhaps less than 30 or so), if not all auto-generated lists.
I like the idea of generic metadata tags. Embedding a character into Category tags is a big kludge (and / is a bad choice of special character, too) and capability would be very useful in many other ways, as described. It also makes creating multiple structured lists inside categories a lot easier.
Given metadata in biography articles that specified birth year, name, and subfield, you could automatically create three different lists (perhaps in different subcategories) that were sorted in three different ways.
I wonder if embedded XML would be a good choice for this. There are WWW-wide XML standards being developed for just this sort of situation, and perhaps it would be good to interoperate with them. If that's too complicated, we could do something like:
[[Metadata:Person.birth_year=1906]]
...and make a list of valid key names and what they should be used for.
We would also need metatags for "Show an auto-generated list of articles/categories with the following matches in their metadata and the following sort order".-- Beland 02:14, 13 Jun 2004 (UTC)
The rendering should say "1 article/subcategory in this category" not articles/subcategories.-- Beland 04:59, 13 Jun 2004 (UTC)
[[Type:Music album]] [[Title:The White album]] [[Artist:The Beatles]] [[Released:1968]]
What kind of things should be in Category:John Lennon? Right now it contains the actual individual John Lennon, several of his solo albums, and the bands of which he has been a member. Also the Category:Beatles article contains both this as a sub-category and John Lennon is a member of both the John Lennon and Beatles category. Basically, it's a total mess. Do you think that there is content that should exist in this category, or should it be removed? - DropDeadGorgias (talk) 13:32, Jun 8, 2004 (UTC)
Category:The Beatles members? Do we really need the article? That's just ungainly. john k 04:38, 9 Jun 2004 (UTC)
Actually, I wasn't referring to the article/category Category:The Beatles members, but to the grammatical article "The". Why not Category:Beatles members? At any rate, as I've repeatedly mentioned before, since all four of the Beatles produced music on their own as not part of the Beatles, they should all be in Category:British musicians separately anyway. john k 08:00, 9 Jun 2004 (UTC)
I had an idea and attempted to configure the category:1975 albums tag at Another Green World so that it would display on the category page as Brian Eno's Another Green World. The text of the link on the category page has not changed (still says Another Green World) but it is now alphabetized under "Brian" instead of "Another". Is this something I did wrong, a bug or a feature? I thought you could use piped links on the article page to change the link text on the category page... Tuf-Kat 20:16, Jun 8, 2004 (UTC)
My main question is about changing a category. What I've noticed is that if you edit an article and rename the category ie if the category was incorrect or too general, it will create a link in the new category page, but the link remains also in the previous category page, even though that link does not show on the subject article page. For example Jack Nicholson was originally categorised as Category:Actors. I'd read on the categorisation talk page that Paul McCartney should be British musician, but not musician, because British musician would be a subcategory of musician, so I applied the same logic and changed Jack to Category:U.S. actors and actresses where he now appears. But in the Category:Actors page he still appears even though there should be nothing to link him there, and in the Jack Nicholson article page, the only category now visible is Category:U.S. actors and actresses. Does anyone know why that would be? Am I doing something wrong or is there a problem with the database or what?
thanks for the info, and I will wait and see. Although some of the ones that I changed were about 4 days ago and the change still doesn't show. Not a problem though. Yes I agree there's a lot needing to be recategorised, and a lot that haven't been categorised at all. I'm sure it will all happen soon. Rossrs 13:58, 9 Jun 2004 (UTC)
Also another question which is less important but I'm trying to get my head around categories and subcategories. So .. Category:Vocalists and Category:Pop singers. To me, all pop singers are by definition vocalists, so along that line of thinking every person categorised as a pop singer should also be categorised as a vocalist. But is that the intention? Should vocalist just be for a band's vocalist? ie Robert Plant vocalist, but not pop singer. Along the same line I would categorise Belinda Carlisle vocalist (Go Gos) and pop singers, (solo). Britney Spears pop singer, but not vocalist? Would be interested to hear how anyone would interpret this. Thanks
That's exactly the point I think - categorising is sure to create examples that are POV. As for Britney, I'd call her a vocalist only insofar as the sounds she makes, do seem to be coming from her mouth, which is not to say that I don't like her or think she's without value, just without talent. Will be looking forward to reading your Category:Record company whores when you have it up and running. I've actually been considering one of my own, which came about when I started wondering where to place Paris Hilton. So stay tuned for Category:People:slutty bimbos :-D Rossrs 13:58, 9 Jun 2004 (UTC)
And now I've just discovered that the category pages can't be linked from here. Which is why I've italicised them instead. As if I wasn't confused enough! :-) Rossrs 10:20, 9 Jun 2004 (UTC)
Would Hilton be Category:Socialites? Category:Heiresses? john k 23:50, 9 Jun 2004 (UTC)
There's been a lot of discussion about what it means for an article to be in a category or in the subcategory of a category, such as whether articles in subcategories "inherit" the parent categories as well. I've been running into this problem myself as I try to figure out what should go where. Perhaps to try bringing a little order to the chaos we could work out some sort of standardized way of indicating that a category's children should or should not inherit it? A set of templates, perhaps, giving guidelines to editors about how the category is to be treated that could be inserted on category pages. Bryan 03:14, 10 Jun 2004 (UTC)
Here's the text that's currently in them:
Subcategories inherit: The subcategories of this category contain articles which are also valid members of this category but which have been divided up into more specific groupings.
Subcategories don't inherit: This category's subcategories are related to this topic, but the articles they contain are not necessarily valid members of this category directly.
My understanding is that the purpose of the category system is to allow automatic cross-checking of lists. Wouldn't the best way to do this be to make each category give one fact? Category:American musicians would be automatically generated by cross-checking Category:Musicians and Category:American people. Tuf-Kat 05:14, Jun 10, 2004 (UTC)
So, having discovered, much to my dismay, that piped links can not be used to change an article's display on the category page (merely its alphabetic classification for ordering purposes), I decided to do so to organize Category:Albums by artist so that Category:A Tribe Called Quest albums would be located after Category:Toto albums and before Category:Triumph albums. Unfortunately, piped links apparently do not actually cause the computer to treat the link as though it were the piped text (in this case "Tribe Called Quest albums") but actually organize it under a separate letter. This leads to the unique circumstance of there being two separate sections for the letter "T", one with all the bands whose category page begins with that letter, and one for those bands whose category page is piped so that it begins with that letter. Is this also a feature I was unaware of, or is it a bug? Tuf-Kat 22:35, Jun 11, 2004 (UTC)
This talk page's project page reads more like a talk page! There appears to be no simple user guide, answering the quesiton "How do I start (or sugest) a category". Andy Mabbett 22:41, 11 Jun 2004 (UTC)
I think that trying to use the category functionality for precise classification, searches like "Poets AND German" and whatnot is futile. It was neither designed or is suitable for it. It didn't work with the old subpage system and it won't work with something as simple as "ParentCategory=X". To do that in a useful way, we would at least need name-value pairs, if not something even more sophisticated. I don't think that our hardware and software will be up to that any time soon.
OTOH, categories are perfectly useful for bottom-up constrution of TOCs: just add [[category:European Countries]], and Luxembourg appears in the list of European countries. Except, the way we do it now, it doesn't, because we put it into "EU Countries", which is a subcategory of "European Countries". So, if I want to find something, I more or less have to know where it is.
The way it really should work, is that an article should be a member of all categories where we want it to show up in the list. So, London should be a member of "Cities of England" and "Cities of the UK" and "Cities of Europe" and "Cities of the World", while Leibnitz should be just a member of "Cities of Austria". Accordingly, Johann Wolfgang Goethe should be a member of "German Poets", "European Poets", "Poets" and "German Scientists".
This leads to every article being put into several categories: the more important the article, the more categories it will belong to.
If and when a full classification and search system is implemented, more categories per article will provide more data to be pumped. Zocky 13:46, 12 Jun 2004 (UTC)
What I would like to see is something automatic, where if you go to the page on Category A, you see a list of all articles, not only in A, but in all its subcategories, subcategories of subcategories, etc. Is anyone working on this? -- BRG 15:47, Jun 18, 2004 (UTC)
Moved this here from the main page (but left a summary behind). -- Beland 09:40, 13 Jun 2004 (UTC)
I recently built "Category:Commercial item transport and distribution" (CITD), which culls articles related to all aspects of this particular field. Another user was concerned that it might be too scattershot and non-specific, and proposed breaking it up. I think, however, that this new category points up an emerging difference in how categories may be used. Although I didn’t explicitly set out to do so, I created in CITD an example of what could be called a "functional" category, as opposed to many of the examples used on this page, which set forth a fairly straightforward taxonomic approach to pulling articles together.
In categorizing these categories as taxonomies, I mean that it takes very little external context to pull together the articles in the category, just common, low-context knowledge like the alphabet and geography. You could probably send in a bot to check keywords and pull together categories like "Companies that begin with the letter H" or "Companies in Germany."
By contrast, the CITD category itself contains substantial pieces of contextual information within it, namely, that there is such a thing as the commercial transportation and distribution industry and which particular things pertain to it. You wouldn’t pull together disparate articles such as on the companies FedEx and Hapag-Lloyd, the items pier and containerlift, and the concepts materiel and logistics, unless you already knew that they all happen to be pertinent to this particular industry. In other words, the very category itself informs the user somewhat. It would be much harder to use a bot to build a category like this, or at least its search algorithm would have to be more complex, containing a dose of contextual knowledge about what it was looking for.
A functional category like CITD is well-suited to the user who is drilling down from the main page in generalized exploration. It answers the challenge of supplying information to the user who may not even know what they are looking for. By contrast, a user more knowledgeable on a topic, say, ornithology, might be more likely to go straight to a taxonomic category like “All bird names that begin with the letter G."
I see a great use for functional categories for big human nexus events, like "Category:World War II." That category (though big enough that it's through subcategories) can eclectically collect everything from the brownshirts and the Holocaust to the Norden bombsight to Glenn Miller and Rosie the Riveter.
There is a limit to everything, of course, so I can see a functional category could be taken too far, or even used as a form of disguised original research: If you hold the hypothesis that parrots are controlling the minds of chiropractors, you might build a category that includes everything about parrots, mind control, and chiropractors. That would be an abuse of the encyclopedic genre. By contrast, I have been defending CITD on the ground that it is indeed a real and natural grouping. While it is not as clear-cut as "Companies that begin with the letter H", it is sufficiently cohesive, unitary, and "real in the world" that grouping its elements together is not an abuse of the encyclopedic genre. I’m not defending CITD here, and whether CITD itself happens to be a good functional category isn’t really the point here, but rather that functional categories do have a place beside taxonomic categories in Wikipedia.
If there were only one categorization possible for each article and thus we were working on the One Table of Contents, the dispute between functional and taxonomic categories might be more pitched, but fortunately we are not so constrained. The two types of categories may exist side by side, and users will benefit from both; users who are more focused on specific information retrieval might find Hapag-Lloyd under "Companies that begin with the letter H" or "German companies," while those who are just delving into the notion of commercial transportation might come across it while exploring the CITD category. Fortunately, then, it isn’t really a matter of one type of category "versus" the other, because there is room and use for both of them. -- Gary D 23:05, 9 Jun 2004 (UTC)
Hmm...I think there's something to be said for the idea that categories are not the most logical way to deal with what you call "functional categories." List pages, the actual article on commercial item transport and distribution, and so forth, seem to me to be a better way of dealing with these kind of things. Otherwise, categories will quickly become completely out of control. I support restricting categories to what you call "taxonomic." john k 16:06, 13 Jun 2004 (UTC)
I'm not sure I have an opinion on the issue directly at hand, but I have a suggestion.
List pages of various kinds and the growing number of categories all attempt to organize information in an easy-to-use-and-edit fashion that provides relevant links with appropriate context to articles. I think there's pretty widespread agreement that whether categories or lists, or both, are used, the goal should be as above. Both list pages and category pages could be better at achieving this goal. Let's brainstorm what we need and want in such a system, then pester the developers until it happens.
What kind of information should be categorizable. The fact that a person is a musician, a Canadian, a trumpeter and a jazz musician are clearly relevant. Should our system be able to accomodate less relevant tidbits (that he's left-handed, male, more than 6.5 feet tall and a 1986 Grammy Award winner for Best Jazz Album)? Should it be possible to combine categories using the MediaWiki software in a way that could create a list of Jamaican-British MPs? What about Jamaican-British MPs who voted against joining the EU? If albums are classified by both year and genre, could we automatically see what the earliest hip hop album with an article is? Could we tag more information, and see what the earliest hip hop album by a white man to go gold in Australia is? What about list-ordering? Do we have to classify something like Popes in either alphabetical or chronological order, or could we tag info to generate a list in either way, or by nationality or some other criterion? Could we take tagging a bit farther and automatically generate something like Timeline of trends in music (1970-1979) listing only events relevant to French music, or psychedelic rock? Should it be possible to contain multiple methods of categorizing things? (i.e. a purple background when using the Fladdershnit Method of classifying amphibians, and a yellow when using the Yamm Method?) Do we want to be able to include captions or other explanatory text on category pages? Could we include a caption on the article page and place it on some or all category pages (i.e. at John Lennon, format category link thusly: [[Category:British musicians|Lennon, John: (1952-1981) British lead singer and frontman for popular rock band The Beatles]])?
Anyway, these are just some questions to get started... Seems to me that a very simple system was put into place, and nobody's really satisfied with it, but since the software is constantly evolving, we have the ability to make the category system even more useful than anybody reading this discussion now is likely imagining. Tuf-Kat 19:32, Jun 14, 2004 (UTC)
Interesting questions.
So it appears that categories are currently being used for both semantic/taxonomic classification (A is a type of B) and navigational/functional linkages (article C is related to topic D; subcategory E is a subset of topic D, etc.).
This means that direct and indirect membership may not carry a clean semantic meaning. For instance, the articles that are members of Category:Units_of_measure or its subcategories are mostly actual units of measure (foot, volt, kilogram, etc.). But other articles include a list (Scientific units named after people), general articles (Historical weights and measures), and related articles (Dimensional analysis). This messes up queries like "show me all units of measure that begin with the letter H". It may also cause the following query to include unwanted results: "show me all article-descendants of Category:Units_of_measure that share at least one word in their titles as a title of an article-descendants of Category:Scientists".
Even worse, navigational linkages make the recursive lists of subcategories potentially uselessly large. For example, consider the following chains:
Systems of Government -> Monarchy -> Royalty -> Royals People -> Royals -> Royalty of England -> Queen Elizabeth II
This unfortunately may lead an automated search to conclude that (among other things) the concept "Royals" and the person "Queen Elizabeth II" are examples of governmental systems.
To solve this problem, linkages could be assigned types. For example (using an XML syntax):
<link type="is-an-instance-of" source="Queen Elizabeth II" target="Category:Royals of England"> <link type="is-a-subset-of" source="Category:Royals of England" target="Category:Royals"> <link type="is-a-type-of" source="Monarchy" target="Systems of government"> <link type="is-on-topic" source="Category:Royalty" target="Category:Monarchy"
That would seem to solve all the problems so far, at the cost of making the category system more complicated.
But another problem is that not all the information we might like to capture is encoded in the category system. There is information inside articles and inside lists and tables which we would also like to be machine-readable. For instance, to answer the query, "show me all Jamaican-British MPs who voted against joining the EU", we might proceed through the following steps:
1.) Extract a list of British MPs from direct membership in Category:Current_British_Members_of_Parliament. But there might be no such category. Instead, there might be a list of members of the House of Lords and a table of members of the House of Commons that shows name, party affiliation, geographical constituency, and term of office. In order to get the information we want, the entire House of Commons table needs to be machine-readable (so names, party, etc., would have to be embedded in an XML-formatted list or the functional equivalent). This is assuming that the querying mechanism knows that by "British Members of Parliament", we don't mean "people who were born in Britain who are now members of some Parliament" but "people who are members of the British Parliament" and that the British Parliament has two subparts, of which, the sub-members of type "person" are of interest. Given the difficulty of this problem, the interface would likely rely on the human inquisitor to make most of these inferences.
2.) Identify the ethnicity of each British MP. It's rather unwieldy to have a list of all people on the planet who have Jamaican ancestry (or who are left-handed, or who are between 6'4" and 6'4.99" tall). And it's unlikely that there's a table which assigns each MP an ethnicity. Such a feature might be mentioned in an individual biography, in which case all biographies would have to XML-encode or whatever the ethnicity (and lots of other properties) of their subjects in a standardized way. Assuming that ethnicity is important enough to record, which it might not be in all cases. For that matter, all British MPs might not have biographies in the database.
3.) Extract a list of people who voted against joining the EU, or a list of British MPs who voted against joining the EU. (Finding a table of the latter would allow us to skip step 1.) Same problems of making lists and tables machine-readable, and you'd probably have to rely on a human to find this table for you.
4.) Unify the lists we've made so far, hope that the names of all MPs are in the exact-same form in all places they are listed (e.g. Tony Blair, vs. Mr. Anthony Charles Lynton "Tony" Blair) or come up with some clever way to cope with them not being so. Not to mention hoping that we haven't somewhere along the line confused John Smith the former MP with John A. Smith the current MP, no relation.
It would take a lot of work to start making article text machine-readable in any way, and before doing that you'd want to establish various markup schemes.
Making lists machine-readable might be a little easier with a few clever tricks. For example, only looking at article/category links, not the explanatory text that may also be in a list item - that would give you a list of canonicalized nodes (that you could unify with lists of category members) rather than a list of flat strings. You'd still have to rely on a human to say, "Unify list A with Category:B". But that could still be quite powerful.
Regularized lists could answer some of the sample queries mentioned, like "What is the earliest hip hop album with an article?" though a sophisticated SQL-like query would have to be constructed.
I wonder how much brute-force data entry Wikipedians would be willing to do, and whether or not existing open-content databases could be used as content sources. I'm thinking of FreeDB for music, for example. There are some non-open but public sources for things like movies (e.g. the IMBD) and books (e.g. Amazon). But perhaps this is beyond the scope of the Wikipedia and deserves its own project, WikiDB, or something.
-- Beland 04:34, 15 Jun 2004 (UTC)
I must confess my bias throughout the category discussions has always been to think of the categories as a tool for a human user browsing through the category list, as a sort of de facto table of contents. My boosterism of functional categories has been in support of that skimming, browsing user. Reading the above description, although intriguing, I must confess I have never considered the category tree as supporting that sort of precision data-mining search. Wikipedia strikes me as more of an imprecise, people-to-people exercise in information transfer, like any traditional encyclopedia in that sense. Anarchic editing army that we are, I wouldn't think our willy-nilly articles are sufficiently structured to support data mining once you drilled in and located them, anyway. Am I missing something? Is there more to search power than is meeting my eye? -- Gary D 06:53, 15 Jun 2004 (UTC)
Some of the ideas of Beland above I've also been thinking about: see Describing the relations right at the top of this page. Since I wrote that I've done some more investigations on this topic. I think I must have been a librarian in a previous life :-)
There's a lot of effort right now in the W3C to create a semantic web: effectively, to build knowledge into web pages systematically so that computer programs (such as search engines) can make semantic judgements about the content - to distinguish between Queen the musical group and Queen the title, for instance. One outcome of this effort is OWL, Web Ontology Language, an XML application which also uses concepts from RDF.
OWL lets you create an ontology for a domain: usually a hierarchical data structure describing the actual things (people, places, pizzas, camera parts, wines, whatever you want to describe - known as individuals in OWL-speak), the types of things (classes and subclasses, with a strict "is a" relationship from class to superclass), and the properties of those things (for a person, this might include their birth date, gender, place of work, and so on - and crucially, properties can link individuals to other individuals). The properties themselves can also be described with a strict class hierarchy (for wines, "has characteristic" would be the superclass of "has colour" and all those other things wine people talk about).
In OWL, each individual has one or more classes: a pizza might be a member of the "vegetarian pizza" class and the "spicy pizza" class. Part of the power of OWL lies in the ability to describe what a "vegetarian pizza" actually is: it's a pizza with no meat toppings and no fish toppings. You can also make statements about properties such as "this property is transitive" (if A and B are related along a property, and B and C are related along the same property, then with a transitive property you can infer that A and C are also related along the property - for example, "ancestor" is transitive). (This is necessarily a heavily simplified description!)
Anyone who's interested in building a true ontology (that is, something that a program can make semantic judgements about) for Wikipedia content should look at OWL. (If I get a spare few days I may beef up the Wikipedia article on the subject. If not, then read the W3C documents, though they're not for XML newbies.)
My gut tells me that OWLifying Wikipedia would be technically possible but a huge pain to introduce. Each article is an individual; there could be a "meta" tab alongside "talk" for an article's OWL data or its wikitax equivalent. A wiki category is an OWL class (but a strict "is a" hierarchy would need to be enforced here). Properties would live in a separate namespace equivalent to category/class. A category/class would also have a "meta" tab that describes the class (restricting its members semantically, etc).
Of course, there is the big overriding question: does this solve anyone's problem? Maybe not today, maybe not tomorrow...
-- Avaragado 09:24, 15 Jun 2004 (UTC)
I have no knowledge of how Robots work (or are "commissioned") on Wikipedia, but it would be useful to have one to convert list pages, such as List of ornithologists, to categories, and insert the relevant category link in each of the pages listed. Andy Mabbett 11:47, 14 Jun 2004 (UTC)
Who decides how to name a category? Is it just the first person to come along? Why is one category [[Category:Israeli people]], but another [[Category:People from Luxembourg]]? Why not [[Category:People from Israel]] or [[Category:Luxembourgeois people]]? Or just plain [[Category:Israelis]] or [[Category:Luxembourgeois]]? Why is it [[Category:Israeli actors and actresses]] but, [[Category:Cinema actors]]? What about [[Category:Cinema actresses]]? Or for that matter, who's to stop there being a [[Category:Filmstars]]? Who decided it's [[Category:Children's writers]] as opposed to [[Category:Children's literature writers]] or [[Category:Children's authors]] or [[Category:Young Adult writers]] or any other of a zillion variations?
Are there any stated conventions anywhere? And, how in the name of Dewey does one navigate the special page of Categories, to see which Categories are already in use? -- Woggly 09:39, 15 Jun 2004 (UTC)
I've looked for stated conventions but haven't been able to find anything that particularly addresses these points. Navigating the special pages of Categories is a nightmare. I wanted to see how something was categorised for Scotland, and by the time I'd slowly worked my way through the alphabetic list to "S", I'd pretty much lost interest. It was then I decided I didn't care how Zimbabwe had been categorised.
As for who decides how to name a category - it does seem to be that the first into the fray makes their own decision which isn't the most practical way of doing it. I started some of the categories you mentioned - my methodology (right or wrong) was that I read through the existing list of categories and followed the format that was already being used. Of course if the very first person to name a category made a poor choice, I've perpetuated it and I'm not exactly happy with the result. For example the actors by nationality - those few that existed were "such and such actors and actresses" so I've continued with that. The generic terms for actors, seemed to group actors and actresses as "actors" (ie Television actors) so with the ones I created (ie Cinema actors) I followed that pattern. Not a good system. I'm also seeing duplicated categories - American actors and U.S. actors and actresses, for example, which no matter how I look at it, is a single category. US actors and actresses had more names in it than American actors, so I renamed everyone in American actors just to get them into the same category. Not a scientific way of doing it. I noticed a week ago there were 2500 categories. Two days ago there were 3500 categories. I can't imagine too many people wading through that mega-list to make sure they are not duplicating categories by slightly rewording the title, and we're going to have (and already have) a whole bunch of categories that should be deleted because they are better categorised/defined elsewhere. Rossrs 10:31, 16 Jun 2004 (UTC)
Good objections. Let me add Category:Science fiction authors vs. Category:Fantasy writers. Poorly picked, too late to fix 'em all without a robot. *sigh* (see above on robots). It would be nice if there was a good category search system. I have an automated tool that collects categories, and I can grep the list I've collected so far. More helpful, though, is to use the category system itself, browsing similar categories for a common super-category that might contain the category you are looking for... This would be even more useful if there were not so many orphaned categories. -- ssd 12:31, 15 Jun 2004 (UTC)
I want to create a very broad category, which would contain all articles related to Cyrillic alphabet, for example Cyrillic alphabet, Saint Cyril, Russian language, etc. Should I name it simply "Category:Cyrillic" or should I go for "Category:Cyrillic topics"? Are there similar categories and how are they named? Nikola 05:21, 16 Jun 2004 (UTC)
The articles in Category:Harry Potter movies have recently been adjusted so that their sort keys—rather than being Movie 1, Movie 2, etc—are now merely 1, 2, etc. The user who did it didn't like the way all the movies were sorted under M. Whilst this might look reasonable given the current system for rendering Category articles, I am worried that this might set a bad precedent. Category:Wheel of Time books has a similar set of articles, but there are 10 of them (soon to be 11 when the next books is published); they are sorted as Book 01, Book 02, etc: all therefore appear under B. I know I prefer this system, and not just because I did it (and the original Category:Harry Potter movies sorting also). I am not certain what needs to be done, but I am certain that there needs to be some discussion about it. Possibly one suggestion, following on from earlier discussions, is to suppress the large letters on a Category page if there will be only a single letter; in other words, since all the articles in Category:Wheel of Time books sort under B, don't bother showing the B. -- Phil | Talk 11:48, Jun 15, 2004 (UTC)
Not precisely.
-- Phil | Talk 13:58, Jun 15, 2004 (UTC)
Sorry, but to me it would appear obvious that the very point of including an optional Sort Key in the Category system is to allow articles to be sorted by something other than just their title. It would also appear reasonable to sort articles about a connected series of Books/Movies in the natural order in which they are supposed to be read/viewed. What would be the point of sorting them alphabetically? This would introduce no new helpful information to the reader. Which IMHO is what Wikipedia is all about. -- Phil | Talk 16:18, Jun 15, 2004 (UTC)
I was the one who changed the category sorting from "Book 1" to "1". Perhaps the best solution would to be to sort under "#01", "#02", etc.
This avoids having the clutter of having a separate category heading "1", "2", "3", etc. for every single book, and it also avoids the disconcerting listing of items under "B" when there is no "B" in their name (or "M" for the movies). The items would then appear under a neutral "#" category, at the head of the category list.
-- Curps 18:40, 15 Jun 2004 (UTC)
An even better solution: sort them under " 01", " 02" (with leading blank). This causes them to appear under the heading " " (space), which means it appears to be under no heading at all. I've gone ahead and done this so that people can evaluate the effect and see if it's acceptable:
Category:Harry Potter books
Category:Harry Potter movies
-- Curps 18:57, 15 Jun 2004 (UTC)
Maybe a modifier to the categories themselves would allow for each specific article/category to state, in its [[Category:]] tags, whether or not it's an example/subset of that category or merely related to it. For example, Category:Egyptian cities would say [[CategoryIs:Egypt]] because Egyptian cities are a part of the nation and geography of Egypt, whereas Egypt-related topics like Category:Egyptian mythology or Category:Egyptian people would say [[CategoryLike:Egypt]]. Both would display as regular categories, but a user could opt to only display articles in a category that are (or are not) examples of the category. Non-specified categories would have to display on searches for both "is" and "like". - Sean Curtin 22:10, 15 Jun 2004 (UTC)
I am trying to work out whether there is a convention/standard, and if not, work one out for these case, because there needs to be clear guidelines.
For example, consider the article cell biology and the Category:Cell biology. Should cell biology be a member of the category? I have seen several solutions:
Related to this is the question about whether an article about the subcategory Category:Foo and the article Foo should be included in the (say) parent category Category:Bar. If it is included and convention #1 (above) is also followed, means that Foo appears in two places in the hierarchy. (e.g. cell biology appearing in both Category:Biology and Category:Cell biology, which seems unsatisfactory, and clashes with the guideline about the filing in the most specific category).
I'm conflicted about the best way to proceed here, but I think clear guidelines would help everybody, suggestions as to a convention, and rationales? -- Lexor| Talk 12:46, 19 Jun 2004 (UTC)
How about these for guidelines...adjust as you see fit. The final version of these should eventually be put in the policy section of this page. -- ssd 16:08, 19 Jun 2004 (UTC)
I would also throw in the following, but it doesn't directly speak to the question of categories with the same name of articles, but just generally to the question of relevant categorization:
-- TreyHarris 18:46, 20 Jun 2004 (UTC)
I am being told I am unable to move Category:Australian MHRs to the correct form, Category: Australian federal MPs. Is there a rule against moving Category pages? If so, what is one supposed to do with a wrongly-titled page? If not, what is the problem? Adam 13:45, 4 Jun 2004 (UTC)
If you look at the history for Wikipedia:Categories for deletion, you will see that think link for Category:Jewish mythology appears red and links to the "edit" page, as if it didn't exist. However, even when you click on that link, there is data there. Is this a mediawiki bug? - DropDeadGorgias (talk) 20:05, Jun 9, 2004 (UTC)
I can't work out why some categories don't appear to be displaying properly. Take a look at the foot of Avignon and the category Category:Cities, towns and villages of France. Even though it's a populated category, it's displaying as if it was an empty article. Can anyone explain what's going on here? -- ChrisO 15:38, 10 Jun 2004 (UTC)
It's important to consider that there are two different audiences: readers and editors. Editors can use Category:Orphaned_categories to find categories that need parenting. And that category can be populated in a semi-automated fashion (or fully automated, if someone implements that). There's no need to pollute the readers' experience with either red links or a "this category doesn't exist except that it clearly does" moment. So I think both of these phenomena should be eliminated. -- Beland 23:21, 23 Jun 2004 (UTC)
OK, I have boldly created Category:Lists of fictional animals and redirected Lists of fictional animals there. To do this, I moved some helpful text (See Alsos and External Links) from the original Lists... page to the Category:Lists... page. With this specific page, I see the need for 3 new category features, two of which people have already stated:
Elf | Talk 16:13, 21 Jun 2004 (UTC)
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 |
I hit much the same problem as Morwen a week or so ago, and backed off from categorization while waiting for the dust to settle. I saw Category:Football (soccer), and Category:Sportspeople, and thought, well, why not Category:Footballers (soccer) with both as supercategories? I decided against it, because a footballer is a sportsperson, but is not a sport. I've come to the conclusion that Categories are currently broken in one serious (but fixable) way: we've got a fantastic set of directed graphs, but none of the arrows is labelled. In other words, we're saying X is related to Y, but if you ask how, I'm going to have to kill you. This makes any kind of semantic inference across those relations impossible (apologies if I'm abusing terminology here but you get my point): we can't say Anyone in a subcategory of People (recursively) is also implicitly a member of People because that's making a huge semantic assumption about the relations, and one that just isn't true in the current state of the wiki.
The fix is to label the arrows: describe the relations. This is, in my limited understanding, what RDF does. That uses the terms subject, predicate, and object. The subject is the thing you're categorizing. The object is the category you're adding it to. And the predicate describes the relation. Predicates allow you to make semantic inferences programmatically. So far in the wiki I've seen two predicates, which I would summarise as Is an example of (John Lennon is an example of a vocalist) and Is, er, related in some way to (Musical groups are, er, related in some way to Music; 251 Menlove Avenue is, er, related in some way to John Lennon). Unless we encode this distinction in the categorization system, we can't make any inferences over the relations. And once we start encoding this distinction a whole new world of possibilities (and problems!) springs up: arbitrary relations.
I want to relate footballers to football. I should be able to use a more specific relation than "is, er, related in some way to", something like "is a participant in". It's the same relation you'd use between basketball players and basketball, but not the same as you'd use between Musical groups and Music, and especially not between 251 Menlove Avenue and John Lennon. In Category:Footballers (soccer) I want to be able to put something like:
[[Relation:Participates (sport)|Category:Football (soccer)]] [[Relation:Example of|Category:Sportspeople]]
You can see the idea. Wikipedia then encodes much more information programmatically. Incidentally, this could also be used with the tricky problem of who belongs in Category:Terrorists - you could use different relations, such as "Alleged to be" (for example!). It also, of course, means that we have a whole new "Relation" namespace to deal with - with RfD, etc, etc.
IMHO without an idea like this we can't make any meaningful semantic inferences whatsoever within the category hierarchies, and categories overall become much less useful.
(Oh, in case someone asks: yes, relations should also be related to other relations :-)
-- Avaragado 10:05, 9 Jun 2004 (UTC)
I am really confused by Category:EU countries. I notice that it contains various countries both as articles and as subcategories, i.e. Denmark and Category:Denmark. However, Category:Denmark, which is a subcat of EU countries contains Danish culture. Isn't this improper inheiritance? Wouldn't this imply that Danish culture is an EU country, or do I misunderstand the point of categorization? - DropDeadGorgias (talk) 20:41, Jun 4, 2004 (UTC)
(moved from main page)
I'm not sure that I can diagram this, so I'm not going to try. I'm thinking about some of the dog topics. For example, dog is a member of pets; dog is also a member of mammals; both mammals and pets are members of animals but neither is a subcategory of the other. Now, how about dog agility? It needs to go under the dog sports category, which needs to be under the dog category, because it's related to dogs. It also needs to go under the sports category, because it's a sport. It probably also needs to go under the hobby category. But dog and sports do not at any higher point in the hierarchy have a common parent; possibly hobby and sports might fall together again under leisure activities (?), but not all sports are hobbies and not all hobbies are sports. Just wanted to give another example of why something might need to be in multiple categories. Thoughts? Elf | Talk 02:51, 4 Jun 2004 (UTC)
At the beginning or at the end of articles (with interlanguage links) ?
It is apparently impossible to rename a Category article once it has been created. I am for example unable to move Catgeory:Australian MHRs (which is an incorrect form) to the correct Category:Australian federal MPs, so now I am going to have to delete them all. (This is User:Chuq's fault - I told him MHR was incorrect but he/she went ahead anyway.) Adam
This problem must be urgently fixed or it will cause endless fights. Adam 15:43, 5 Jun 2004 (UTC)
A most unsatifactory state of affairs. This whole scheme will cause as many problems as it solves, maybe more. Adam 17:52, 5 Jun 2004 (UTC)
Might it be possible to have a "minor category" (or "organizational category" or "invisible category" or whatever you want to call it)? Perhaps put a small "[see more categories]" link in the current Category box, with the link either revealing a hidden box below the current one, or taking you to separate subpage of the article with the full category list.... This way categories that are important to the reader would be visible in the current box, while those that are useful but redundant (or more useful to the editors/organizers than the readers) could be hidden from casual view, but easily reachable by those who are interested.
Even if the the minor category idea is unworkable, I suspect a [see more] functionality will be needed eventually, if certain articles end up placed in more than a small handful of categories.
What do you think? -- Catherine | talk 22:21, 5 Jun 2004 (UTC)
Ah, I understand. I agree - this would be really nice if we could do that. Any developers about? john k 04:06, 10 Jun 2004 (UTC)
I've seen several complaints about the free-form structure of the list of articles belonging to a category. Take a look at Category:Lists_of_battles, for instance. The top half is the neat tree found in the old-style list. Below in the box is an automatically generated unsorted jumble of articles that belong in the category. These are the solutions I see:
1.) Merge lists of articles into category pages, but manually maintain structured copies in the editable sections. This requires more work, but at least there's only one page (to help maintain consistency between the manual and automatic lists).
2.) Put some tags in the editable section of the category page that tell the system how to create a properly sorted nested list. This is possibly easier for editors, since all the changes they need to make are on a single page, but when new articles are added to the category, they will also need to be put into the sort order by futzing with the category page, or else end up in an unsorted section. This option also makes it really easy to re-factor.
3.) Put tags in the article themselves that allow the system to sort them in a nice way. A bit of syntax that might help might be to stick sort strings after the category name with a / as a separator. So for example:
The sort order would have to be defined somehow - it could be automatic, or specified in the category page. But it would be especially annoying to distribute the sort order among the article pages - that would make renumbering really annoying.
(Note from later on - / would be a bad choice of character, since this conflicts with the literal / in URLs.)
4.) A hybrid method, in which editors have a list-making interface similar to what is envisioned in option 2, but when the information is stored, the system goes back and writes tags as in option 3. Easily editable, easily renumbered, but all the category information is kept in one place. Harder to implement in code, though, and more resource-intensive.
-- Beland 10:12, 5 Jun 2004 (UTC)
5) For really long lists of stuff, it would be nice to simply break it up into alphabetized sections with letter headers. The software could be told to do this with a TOC tag of some sort, or it could do this automatically whenever a certain size threshold is reached. Another simple formatting option could include telling the software to put the links in a bulleted list instead of a comma-delimited string. Bryan 02:04, 6 Jun 2004 (UTC)
The rendering on category pages has changed to make subcategory and article listings multi-column. That's nice. But it has also added big, bold letters at the start of each letter in the alphabetical listings. This is a big waste of space and looks quite ugly when there is a small number of items. See for instance: Category:Main_page or [[:Category:Wikipedia]. I think it should be obvious when a list is alphabetical, and navigation in an alphabetical list is fairly trivial - you just go up or down as appropriate (even if it's wrapped into multiple columns).
I would urge disabling the letters, at least for lists with a small number of items (perhaps less than 30 or so), if not all auto-generated lists.
I like the idea of generic metadata tags. Embedding a character into Category tags is a big kludge (and / is a bad choice of special character, too) and capability would be very useful in many other ways, as described. It also makes creating multiple structured lists inside categories a lot easier.
Given metadata in biography articles that specified birth year, name, and subfield, you could automatically create three different lists (perhaps in different subcategories) that were sorted in three different ways.
I wonder if embedded XML would be a good choice for this. There are WWW-wide XML standards being developed for just this sort of situation, and perhaps it would be good to interoperate with them. If that's too complicated, we could do something like:
[[Metadata:Person.birth_year=1906]]
...and make a list of valid key names and what they should be used for.
We would also need metatags for "Show an auto-generated list of articles/categories with the following matches in their metadata and the following sort order".-- Beland 02:14, 13 Jun 2004 (UTC)
The rendering should say "1 article/subcategory in this category" not articles/subcategories.-- Beland 04:59, 13 Jun 2004 (UTC)
[[Type:Music album]] [[Title:The White album]] [[Artist:The Beatles]] [[Released:1968]]
What kind of things should be in Category:John Lennon? Right now it contains the actual individual John Lennon, several of his solo albums, and the bands of which he has been a member. Also the Category:Beatles article contains both this as a sub-category and John Lennon is a member of both the John Lennon and Beatles category. Basically, it's a total mess. Do you think that there is content that should exist in this category, or should it be removed? - DropDeadGorgias (talk) 13:32, Jun 8, 2004 (UTC)
Category:The Beatles members? Do we really need the article? That's just ungainly. john k 04:38, 9 Jun 2004 (UTC)
Actually, I wasn't referring to the article/category Category:The Beatles members, but to the grammatical article "The". Why not Category:Beatles members? At any rate, as I've repeatedly mentioned before, since all four of the Beatles produced music on their own as not part of the Beatles, they should all be in Category:British musicians separately anyway. john k 08:00, 9 Jun 2004 (UTC)
I had an idea and attempted to configure the category:1975 albums tag at Another Green World so that it would display on the category page as Brian Eno's Another Green World. The text of the link on the category page has not changed (still says Another Green World) but it is now alphabetized under "Brian" instead of "Another". Is this something I did wrong, a bug or a feature? I thought you could use piped links on the article page to change the link text on the category page... Tuf-Kat 20:16, Jun 8, 2004 (UTC)
My main question is about changing a category. What I've noticed is that if you edit an article and rename the category ie if the category was incorrect or too general, it will create a link in the new category page, but the link remains also in the previous category page, even though that link does not show on the subject article page. For example Jack Nicholson was originally categorised as Category:Actors. I'd read on the categorisation talk page that Paul McCartney should be British musician, but not musician, because British musician would be a subcategory of musician, so I applied the same logic and changed Jack to Category:U.S. actors and actresses where he now appears. But in the Category:Actors page he still appears even though there should be nothing to link him there, and in the Jack Nicholson article page, the only category now visible is Category:U.S. actors and actresses. Does anyone know why that would be? Am I doing something wrong or is there a problem with the database or what?
thanks for the info, and I will wait and see. Although some of the ones that I changed were about 4 days ago and the change still doesn't show. Not a problem though. Yes I agree there's a lot needing to be recategorised, and a lot that haven't been categorised at all. I'm sure it will all happen soon. Rossrs 13:58, 9 Jun 2004 (UTC)
Also another question which is less important but I'm trying to get my head around categories and subcategories. So .. Category:Vocalists and Category:Pop singers. To me, all pop singers are by definition vocalists, so along that line of thinking every person categorised as a pop singer should also be categorised as a vocalist. But is that the intention? Should vocalist just be for a band's vocalist? ie Robert Plant vocalist, but not pop singer. Along the same line I would categorise Belinda Carlisle vocalist (Go Gos) and pop singers, (solo). Britney Spears pop singer, but not vocalist? Would be interested to hear how anyone would interpret this. Thanks
That's exactly the point I think - categorising is sure to create examples that are POV. As for Britney, I'd call her a vocalist only insofar as the sounds she makes, do seem to be coming from her mouth, which is not to say that I don't like her or think she's without value, just without talent. Will be looking forward to reading your Category:Record company whores when you have it up and running. I've actually been considering one of my own, which came about when I started wondering where to place Paris Hilton. So stay tuned for Category:People:slutty bimbos :-D Rossrs 13:58, 9 Jun 2004 (UTC)
And now I've just discovered that the category pages can't be linked from here. Which is why I've italicised them instead. As if I wasn't confused enough! :-) Rossrs 10:20, 9 Jun 2004 (UTC)
Would Hilton be Category:Socialites? Category:Heiresses? john k 23:50, 9 Jun 2004 (UTC)
There's been a lot of discussion about what it means for an article to be in a category or in the subcategory of a category, such as whether articles in subcategories "inherit" the parent categories as well. I've been running into this problem myself as I try to figure out what should go where. Perhaps to try bringing a little order to the chaos we could work out some sort of standardized way of indicating that a category's children should or should not inherit it? A set of templates, perhaps, giving guidelines to editors about how the category is to be treated that could be inserted on category pages. Bryan 03:14, 10 Jun 2004 (UTC)
Here's the text that's currently in them:
Subcategories inherit: The subcategories of this category contain articles which are also valid members of this category but which have been divided up into more specific groupings.
Subcategories don't inherit: This category's subcategories are related to this topic, but the articles they contain are not necessarily valid members of this category directly.
My understanding is that the purpose of the category system is to allow automatic cross-checking of lists. Wouldn't the best way to do this be to make each category give one fact? Category:American musicians would be automatically generated by cross-checking Category:Musicians and Category:American people. Tuf-Kat 05:14, Jun 10, 2004 (UTC)
So, having discovered, much to my dismay, that piped links can not be used to change an article's display on the category page (merely its alphabetic classification for ordering purposes), I decided to do so to organize Category:Albums by artist so that Category:A Tribe Called Quest albums would be located after Category:Toto albums and before Category:Triumph albums. Unfortunately, piped links apparently do not actually cause the computer to treat the link as though it were the piped text (in this case "Tribe Called Quest albums") but actually organize it under a separate letter. This leads to the unique circumstance of there being two separate sections for the letter "T", one with all the bands whose category page begins with that letter, and one for those bands whose category page is piped so that it begins with that letter. Is this also a feature I was unaware of, or is it a bug? Tuf-Kat 22:35, Jun 11, 2004 (UTC)
This talk page's project page reads more like a talk page! There appears to be no simple user guide, answering the quesiton "How do I start (or sugest) a category". Andy Mabbett 22:41, 11 Jun 2004 (UTC)
I think that trying to use the category functionality for precise classification, searches like "Poets AND German" and whatnot is futile. It was neither designed or is suitable for it. It didn't work with the old subpage system and it won't work with something as simple as "ParentCategory=X". To do that in a useful way, we would at least need name-value pairs, if not something even more sophisticated. I don't think that our hardware and software will be up to that any time soon.
OTOH, categories are perfectly useful for bottom-up constrution of TOCs: just add [[category:European Countries]], and Luxembourg appears in the list of European countries. Except, the way we do it now, it doesn't, because we put it into "EU Countries", which is a subcategory of "European Countries". So, if I want to find something, I more or less have to know where it is.
The way it really should work, is that an article should be a member of all categories where we want it to show up in the list. So, London should be a member of "Cities of England" and "Cities of the UK" and "Cities of Europe" and "Cities of the World", while Leibnitz should be just a member of "Cities of Austria". Accordingly, Johann Wolfgang Goethe should be a member of "German Poets", "European Poets", "Poets" and "German Scientists".
This leads to every article being put into several categories: the more important the article, the more categories it will belong to.
If and when a full classification and search system is implemented, more categories per article will provide more data to be pumped. Zocky 13:46, 12 Jun 2004 (UTC)
What I would like to see is something automatic, where if you go to the page on Category A, you see a list of all articles, not only in A, but in all its subcategories, subcategories of subcategories, etc. Is anyone working on this? -- BRG 15:47, Jun 18, 2004 (UTC)
Moved this here from the main page (but left a summary behind). -- Beland 09:40, 13 Jun 2004 (UTC)
I recently built "Category:Commercial item transport and distribution" (CITD), which culls articles related to all aspects of this particular field. Another user was concerned that it might be too scattershot and non-specific, and proposed breaking it up. I think, however, that this new category points up an emerging difference in how categories may be used. Although I didn’t explicitly set out to do so, I created in CITD an example of what could be called a "functional" category, as opposed to many of the examples used on this page, which set forth a fairly straightforward taxonomic approach to pulling articles together.
In categorizing these categories as taxonomies, I mean that it takes very little external context to pull together the articles in the category, just common, low-context knowledge like the alphabet and geography. You could probably send in a bot to check keywords and pull together categories like "Companies that begin with the letter H" or "Companies in Germany."
By contrast, the CITD category itself contains substantial pieces of contextual information within it, namely, that there is such a thing as the commercial transportation and distribution industry and which particular things pertain to it. You wouldn’t pull together disparate articles such as on the companies FedEx and Hapag-Lloyd, the items pier and containerlift, and the concepts materiel and logistics, unless you already knew that they all happen to be pertinent to this particular industry. In other words, the very category itself informs the user somewhat. It would be much harder to use a bot to build a category like this, or at least its search algorithm would have to be more complex, containing a dose of contextual knowledge about what it was looking for.
A functional category like CITD is well-suited to the user who is drilling down from the main page in generalized exploration. It answers the challenge of supplying information to the user who may not even know what they are looking for. By contrast, a user more knowledgeable on a topic, say, ornithology, might be more likely to go straight to a taxonomic category like “All bird names that begin with the letter G."
I see a great use for functional categories for big human nexus events, like "Category:World War II." That category (though big enough that it's through subcategories) can eclectically collect everything from the brownshirts and the Holocaust to the Norden bombsight to Glenn Miller and Rosie the Riveter.
There is a limit to everything, of course, so I can see a functional category could be taken too far, or even used as a form of disguised original research: If you hold the hypothesis that parrots are controlling the minds of chiropractors, you might build a category that includes everything about parrots, mind control, and chiropractors. That would be an abuse of the encyclopedic genre. By contrast, I have been defending CITD on the ground that it is indeed a real and natural grouping. While it is not as clear-cut as "Companies that begin with the letter H", it is sufficiently cohesive, unitary, and "real in the world" that grouping its elements together is not an abuse of the encyclopedic genre. I’m not defending CITD here, and whether CITD itself happens to be a good functional category isn’t really the point here, but rather that functional categories do have a place beside taxonomic categories in Wikipedia.
If there were only one categorization possible for each article and thus we were working on the One Table of Contents, the dispute between functional and taxonomic categories might be more pitched, but fortunately we are not so constrained. The two types of categories may exist side by side, and users will benefit from both; users who are more focused on specific information retrieval might find Hapag-Lloyd under "Companies that begin with the letter H" or "German companies," while those who are just delving into the notion of commercial transportation might come across it while exploring the CITD category. Fortunately, then, it isn’t really a matter of one type of category "versus" the other, because there is room and use for both of them. -- Gary D 23:05, 9 Jun 2004 (UTC)
Hmm...I think there's something to be said for the idea that categories are not the most logical way to deal with what you call "functional categories." List pages, the actual article on commercial item transport and distribution, and so forth, seem to me to be a better way of dealing with these kind of things. Otherwise, categories will quickly become completely out of control. I support restricting categories to what you call "taxonomic." john k 16:06, 13 Jun 2004 (UTC)
I'm not sure I have an opinion on the issue directly at hand, but I have a suggestion.
List pages of various kinds and the growing number of categories all attempt to organize information in an easy-to-use-and-edit fashion that provides relevant links with appropriate context to articles. I think there's pretty widespread agreement that whether categories or lists, or both, are used, the goal should be as above. Both list pages and category pages could be better at achieving this goal. Let's brainstorm what we need and want in such a system, then pester the developers until it happens.
What kind of information should be categorizable. The fact that a person is a musician, a Canadian, a trumpeter and a jazz musician are clearly relevant. Should our system be able to accomodate less relevant tidbits (that he's left-handed, male, more than 6.5 feet tall and a 1986 Grammy Award winner for Best Jazz Album)? Should it be possible to combine categories using the MediaWiki software in a way that could create a list of Jamaican-British MPs? What about Jamaican-British MPs who voted against joining the EU? If albums are classified by both year and genre, could we automatically see what the earliest hip hop album with an article is? Could we tag more information, and see what the earliest hip hop album by a white man to go gold in Australia is? What about list-ordering? Do we have to classify something like Popes in either alphabetical or chronological order, or could we tag info to generate a list in either way, or by nationality or some other criterion? Could we take tagging a bit farther and automatically generate something like Timeline of trends in music (1970-1979) listing only events relevant to French music, or psychedelic rock? Should it be possible to contain multiple methods of categorizing things? (i.e. a purple background when using the Fladdershnit Method of classifying amphibians, and a yellow when using the Yamm Method?) Do we want to be able to include captions or other explanatory text on category pages? Could we include a caption on the article page and place it on some or all category pages (i.e. at John Lennon, format category link thusly: [[Category:British musicians|Lennon, John: (1952-1981) British lead singer and frontman for popular rock band The Beatles]])?
Anyway, these are just some questions to get started... Seems to me that a very simple system was put into place, and nobody's really satisfied with it, but since the software is constantly evolving, we have the ability to make the category system even more useful than anybody reading this discussion now is likely imagining. Tuf-Kat 19:32, Jun 14, 2004 (UTC)
Interesting questions.
So it appears that categories are currently being used for both semantic/taxonomic classification (A is a type of B) and navigational/functional linkages (article C is related to topic D; subcategory E is a subset of topic D, etc.).
This means that direct and indirect membership may not carry a clean semantic meaning. For instance, the articles that are members of Category:Units_of_measure or its subcategories are mostly actual units of measure (foot, volt, kilogram, etc.). But other articles include a list (Scientific units named after people), general articles (Historical weights and measures), and related articles (Dimensional analysis). This messes up queries like "show me all units of measure that begin with the letter H". It may also cause the following query to include unwanted results: "show me all article-descendants of Category:Units_of_measure that share at least one word in their titles as a title of an article-descendants of Category:Scientists".
Even worse, navigational linkages make the recursive lists of subcategories potentially uselessly large. For example, consider the following chains:
Systems of Government -> Monarchy -> Royalty -> Royals People -> Royals -> Royalty of England -> Queen Elizabeth II
This unfortunately may lead an automated search to conclude that (among other things) the concept "Royals" and the person "Queen Elizabeth II" are examples of governmental systems.
To solve this problem, linkages could be assigned types. For example (using an XML syntax):
<link type="is-an-instance-of" source="Queen Elizabeth II" target="Category:Royals of England"> <link type="is-a-subset-of" source="Category:Royals of England" target="Category:Royals"> <link type="is-a-type-of" source="Monarchy" target="Systems of government"> <link type="is-on-topic" source="Category:Royalty" target="Category:Monarchy"
That would seem to solve all the problems so far, at the cost of making the category system more complicated.
But another problem is that not all the information we might like to capture is encoded in the category system. There is information inside articles and inside lists and tables which we would also like to be machine-readable. For instance, to answer the query, "show me all Jamaican-British MPs who voted against joining the EU", we might proceed through the following steps:
1.) Extract a list of British MPs from direct membership in Category:Current_British_Members_of_Parliament. But there might be no such category. Instead, there might be a list of members of the House of Lords and a table of members of the House of Commons that shows name, party affiliation, geographical constituency, and term of office. In order to get the information we want, the entire House of Commons table needs to be machine-readable (so names, party, etc., would have to be embedded in an XML-formatted list or the functional equivalent). This is assuming that the querying mechanism knows that by "British Members of Parliament", we don't mean "people who were born in Britain who are now members of some Parliament" but "people who are members of the British Parliament" and that the British Parliament has two subparts, of which, the sub-members of type "person" are of interest. Given the difficulty of this problem, the interface would likely rely on the human inquisitor to make most of these inferences.
2.) Identify the ethnicity of each British MP. It's rather unwieldy to have a list of all people on the planet who have Jamaican ancestry (or who are left-handed, or who are between 6'4" and 6'4.99" tall). And it's unlikely that there's a table which assigns each MP an ethnicity. Such a feature might be mentioned in an individual biography, in which case all biographies would have to XML-encode or whatever the ethnicity (and lots of other properties) of their subjects in a standardized way. Assuming that ethnicity is important enough to record, which it might not be in all cases. For that matter, all British MPs might not have biographies in the database.
3.) Extract a list of people who voted against joining the EU, or a list of British MPs who voted against joining the EU. (Finding a table of the latter would allow us to skip step 1.) Same problems of making lists and tables machine-readable, and you'd probably have to rely on a human to find this table for you.
4.) Unify the lists we've made so far, hope that the names of all MPs are in the exact-same form in all places they are listed (e.g. Tony Blair, vs. Mr. Anthony Charles Lynton "Tony" Blair) or come up with some clever way to cope with them not being so. Not to mention hoping that we haven't somewhere along the line confused John Smith the former MP with John A. Smith the current MP, no relation.
It would take a lot of work to start making article text machine-readable in any way, and before doing that you'd want to establish various markup schemes.
Making lists machine-readable might be a little easier with a few clever tricks. For example, only looking at article/category links, not the explanatory text that may also be in a list item - that would give you a list of canonicalized nodes (that you could unify with lists of category members) rather than a list of flat strings. You'd still have to rely on a human to say, "Unify list A with Category:B". But that could still be quite powerful.
Regularized lists could answer some of the sample queries mentioned, like "What is the earliest hip hop album with an article?" though a sophisticated SQL-like query would have to be constructed.
I wonder how much brute-force data entry Wikipedians would be willing to do, and whether or not existing open-content databases could be used as content sources. I'm thinking of FreeDB for music, for example. There are some non-open but public sources for things like movies (e.g. the IMBD) and books (e.g. Amazon). But perhaps this is beyond the scope of the Wikipedia and deserves its own project, WikiDB, or something.
-- Beland 04:34, 15 Jun 2004 (UTC)
I must confess my bias throughout the category discussions has always been to think of the categories as a tool for a human user browsing through the category list, as a sort of de facto table of contents. My boosterism of functional categories has been in support of that skimming, browsing user. Reading the above description, although intriguing, I must confess I have never considered the category tree as supporting that sort of precision data-mining search. Wikipedia strikes me as more of an imprecise, people-to-people exercise in information transfer, like any traditional encyclopedia in that sense. Anarchic editing army that we are, I wouldn't think our willy-nilly articles are sufficiently structured to support data mining once you drilled in and located them, anyway. Am I missing something? Is there more to search power than is meeting my eye? -- Gary D 06:53, 15 Jun 2004 (UTC)
Some of the ideas of Beland above I've also been thinking about: see Describing the relations right at the top of this page. Since I wrote that I've done some more investigations on this topic. I think I must have been a librarian in a previous life :-)
There's a lot of effort right now in the W3C to create a semantic web: effectively, to build knowledge into web pages systematically so that computer programs (such as search engines) can make semantic judgements about the content - to distinguish between Queen the musical group and Queen the title, for instance. One outcome of this effort is OWL, Web Ontology Language, an XML application which also uses concepts from RDF.
OWL lets you create an ontology for a domain: usually a hierarchical data structure describing the actual things (people, places, pizzas, camera parts, wines, whatever you want to describe - known as individuals in OWL-speak), the types of things (classes and subclasses, with a strict "is a" relationship from class to superclass), and the properties of those things (for a person, this might include their birth date, gender, place of work, and so on - and crucially, properties can link individuals to other individuals). The properties themselves can also be described with a strict class hierarchy (for wines, "has characteristic" would be the superclass of "has colour" and all those other things wine people talk about).
In OWL, each individual has one or more classes: a pizza might be a member of the "vegetarian pizza" class and the "spicy pizza" class. Part of the power of OWL lies in the ability to describe what a "vegetarian pizza" actually is: it's a pizza with no meat toppings and no fish toppings. You can also make statements about properties such as "this property is transitive" (if A and B are related along a property, and B and C are related along the same property, then with a transitive property you can infer that A and C are also related along the property - for example, "ancestor" is transitive). (This is necessarily a heavily simplified description!)
Anyone who's interested in building a true ontology (that is, something that a program can make semantic judgements about) for Wikipedia content should look at OWL. (If I get a spare few days I may beef up the Wikipedia article on the subject. If not, then read the W3C documents, though they're not for XML newbies.)
My gut tells me that OWLifying Wikipedia would be technically possible but a huge pain to introduce. Each article is an individual; there could be a "meta" tab alongside "talk" for an article's OWL data or its wikitax equivalent. A wiki category is an OWL class (but a strict "is a" hierarchy would need to be enforced here). Properties would live in a separate namespace equivalent to category/class. A category/class would also have a "meta" tab that describes the class (restricting its members semantically, etc).
Of course, there is the big overriding question: does this solve anyone's problem? Maybe not today, maybe not tomorrow...
-- Avaragado 09:24, 15 Jun 2004 (UTC)
I have no knowledge of how Robots work (or are "commissioned") on Wikipedia, but it would be useful to have one to convert list pages, such as List of ornithologists, to categories, and insert the relevant category link in each of the pages listed. Andy Mabbett 11:47, 14 Jun 2004 (UTC)
Who decides how to name a category? Is it just the first person to come along? Why is one category [[Category:Israeli people]], but another [[Category:People from Luxembourg]]? Why not [[Category:People from Israel]] or [[Category:Luxembourgeois people]]? Or just plain [[Category:Israelis]] or [[Category:Luxembourgeois]]? Why is it [[Category:Israeli actors and actresses]] but, [[Category:Cinema actors]]? What about [[Category:Cinema actresses]]? Or for that matter, who's to stop there being a [[Category:Filmstars]]? Who decided it's [[Category:Children's writers]] as opposed to [[Category:Children's literature writers]] or [[Category:Children's authors]] or [[Category:Young Adult writers]] or any other of a zillion variations?
Are there any stated conventions anywhere? And, how in the name of Dewey does one navigate the special page of Categories, to see which Categories are already in use? -- Woggly 09:39, 15 Jun 2004 (UTC)
I've looked for stated conventions but haven't been able to find anything that particularly addresses these points. Navigating the special pages of Categories is a nightmare. I wanted to see how something was categorised for Scotland, and by the time I'd slowly worked my way through the alphabetic list to "S", I'd pretty much lost interest. It was then I decided I didn't care how Zimbabwe had been categorised.
As for who decides how to name a category - it does seem to be that the first into the fray makes their own decision which isn't the most practical way of doing it. I started some of the categories you mentioned - my methodology (right or wrong) was that I read through the existing list of categories and followed the format that was already being used. Of course if the very first person to name a category made a poor choice, I've perpetuated it and I'm not exactly happy with the result. For example the actors by nationality - those few that existed were "such and such actors and actresses" so I've continued with that. The generic terms for actors, seemed to group actors and actresses as "actors" (ie Television actors) so with the ones I created (ie Cinema actors) I followed that pattern. Not a good system. I'm also seeing duplicated categories - American actors and U.S. actors and actresses, for example, which no matter how I look at it, is a single category. US actors and actresses had more names in it than American actors, so I renamed everyone in American actors just to get them into the same category. Not a scientific way of doing it. I noticed a week ago there were 2500 categories. Two days ago there were 3500 categories. I can't imagine too many people wading through that mega-list to make sure they are not duplicating categories by slightly rewording the title, and we're going to have (and already have) a whole bunch of categories that should be deleted because they are better categorised/defined elsewhere. Rossrs 10:31, 16 Jun 2004 (UTC)
Good objections. Let me add Category:Science fiction authors vs. Category:Fantasy writers. Poorly picked, too late to fix 'em all without a robot. *sigh* (see above on robots). It would be nice if there was a good category search system. I have an automated tool that collects categories, and I can grep the list I've collected so far. More helpful, though, is to use the category system itself, browsing similar categories for a common super-category that might contain the category you are looking for... This would be even more useful if there were not so many orphaned categories. -- ssd 12:31, 15 Jun 2004 (UTC)
I want to create a very broad category, which would contain all articles related to Cyrillic alphabet, for example Cyrillic alphabet, Saint Cyril, Russian language, etc. Should I name it simply "Category:Cyrillic" or should I go for "Category:Cyrillic topics"? Are there similar categories and how are they named? Nikola 05:21, 16 Jun 2004 (UTC)
The articles in Category:Harry Potter movies have recently been adjusted so that their sort keys—rather than being Movie 1, Movie 2, etc—are now merely 1, 2, etc. The user who did it didn't like the way all the movies were sorted under M. Whilst this might look reasonable given the current system for rendering Category articles, I am worried that this might set a bad precedent. Category:Wheel of Time books has a similar set of articles, but there are 10 of them (soon to be 11 when the next books is published); they are sorted as Book 01, Book 02, etc: all therefore appear under B. I know I prefer this system, and not just because I did it (and the original Category:Harry Potter movies sorting also). I am not certain what needs to be done, but I am certain that there needs to be some discussion about it. Possibly one suggestion, following on from earlier discussions, is to suppress the large letters on a Category page if there will be only a single letter; in other words, since all the articles in Category:Wheel of Time books sort under B, don't bother showing the B. -- Phil | Talk 11:48, Jun 15, 2004 (UTC)
Not precisely.
-- Phil | Talk 13:58, Jun 15, 2004 (UTC)
Sorry, but to me it would appear obvious that the very point of including an optional Sort Key in the Category system is to allow articles to be sorted by something other than just their title. It would also appear reasonable to sort articles about a connected series of Books/Movies in the natural order in which they are supposed to be read/viewed. What would be the point of sorting them alphabetically? This would introduce no new helpful information to the reader. Which IMHO is what Wikipedia is all about. -- Phil | Talk 16:18, Jun 15, 2004 (UTC)
I was the one who changed the category sorting from "Book 1" to "1". Perhaps the best solution would to be to sort under "#01", "#02", etc.
This avoids having the clutter of having a separate category heading "1", "2", "3", etc. for every single book, and it also avoids the disconcerting listing of items under "B" when there is no "B" in their name (or "M" for the movies). The items would then appear under a neutral "#" category, at the head of the category list.
-- Curps 18:40, 15 Jun 2004 (UTC)
An even better solution: sort them under " 01", " 02" (with leading blank). This causes them to appear under the heading " " (space), which means it appears to be under no heading at all. I've gone ahead and done this so that people can evaluate the effect and see if it's acceptable:
Category:Harry Potter books
Category:Harry Potter movies
-- Curps 18:57, 15 Jun 2004 (UTC)
Maybe a modifier to the categories themselves would allow for each specific article/category to state, in its [[Category:]] tags, whether or not it's an example/subset of that category or merely related to it. For example, Category:Egyptian cities would say [[CategoryIs:Egypt]] because Egyptian cities are a part of the nation and geography of Egypt, whereas Egypt-related topics like Category:Egyptian mythology or Category:Egyptian people would say [[CategoryLike:Egypt]]. Both would display as regular categories, but a user could opt to only display articles in a category that are (or are not) examples of the category. Non-specified categories would have to display on searches for both "is" and "like". - Sean Curtin 22:10, 15 Jun 2004 (UTC)
I am trying to work out whether there is a convention/standard, and if not, work one out for these case, because there needs to be clear guidelines.
For example, consider the article cell biology and the Category:Cell biology. Should cell biology be a member of the category? I have seen several solutions:
Related to this is the question about whether an article about the subcategory Category:Foo and the article Foo should be included in the (say) parent category Category:Bar. If it is included and convention #1 (above) is also followed, means that Foo appears in two places in the hierarchy. (e.g. cell biology appearing in both Category:Biology and Category:Cell biology, which seems unsatisfactory, and clashes with the guideline about the filing in the most specific category).
I'm conflicted about the best way to proceed here, but I think clear guidelines would help everybody, suggestions as to a convention, and rationales? -- Lexor| Talk 12:46, 19 Jun 2004 (UTC)
How about these for guidelines...adjust as you see fit. The final version of these should eventually be put in the policy section of this page. -- ssd 16:08, 19 Jun 2004 (UTC)
I would also throw in the following, but it doesn't directly speak to the question of categories with the same name of articles, but just generally to the question of relevant categorization:
-- TreyHarris 18:46, 20 Jun 2004 (UTC)
I am being told I am unable to move Category:Australian MHRs to the correct form, Category: Australian federal MPs. Is there a rule against moving Category pages? If so, what is one supposed to do with a wrongly-titled page? If not, what is the problem? Adam 13:45, 4 Jun 2004 (UTC)
If you look at the history for Wikipedia:Categories for deletion, you will see that think link for Category:Jewish mythology appears red and links to the "edit" page, as if it didn't exist. However, even when you click on that link, there is data there. Is this a mediawiki bug? - DropDeadGorgias (talk) 20:05, Jun 9, 2004 (UTC)
I can't work out why some categories don't appear to be displaying properly. Take a look at the foot of Avignon and the category Category:Cities, towns and villages of France. Even though it's a populated category, it's displaying as if it was an empty article. Can anyone explain what's going on here? -- ChrisO 15:38, 10 Jun 2004 (UTC)
It's important to consider that there are two different audiences: readers and editors. Editors can use Category:Orphaned_categories to find categories that need parenting. And that category can be populated in a semi-automated fashion (or fully automated, if someone implements that). There's no need to pollute the readers' experience with either red links or a "this category doesn't exist except that it clearly does" moment. So I think both of these phenomena should be eliminated. -- Beland 23:21, 23 Jun 2004 (UTC)
OK, I have boldly created Category:Lists of fictional animals and redirected Lists of fictional animals there. To do this, I moved some helpful text (See Alsos and External Links) from the original Lists... page to the Category:Lists... page. With this specific page, I see the need for 3 new category features, two of which people have already stated:
Elf | Talk 16:13, 21 Jun 2004 (UTC)