This page is currently inactive and is retained for
historical reference. Either the page is no longer relevant or consensus on its purpose has become unclear. To revive discussion, seek broader input via a forum such as the village pump. |
There are now a large number of clones of Wikipedia's content on the World Wide Web, with various degrees of licence compliance. This is fine if they are in compliance with the GFDL; indeed, it was one of the original goals of the project. Some of them are high-quality mirrors, whilst others are poor-quality bags of spamdexing search-engine fodder.
What seems clear is that many of these clones are using search engine optimization techniques to achieve higher rankings on search engines than the original Wikipedia pages.
The question is: should we start to try to compete with these sites in terms of search engine rankings?
Wikipedia needs to keep its traffic up to maintain its momentum, to continue growing (both in breadth and depth) its userbase, editorbase, and content. Most mirrors are significantly out of date. The majority fail to properly comply with the GFDL. Some wrap Wikipedia's content with questionable ads or associate it with content that doesn't match our high NPOV standards. Many selectively include articles that support their agenda and omit those that don't. More traffic brings us more editors - we're still very short of editors with knowledge outside the western world, of medicine, fine arts, the "soft" sciences and in many other areas. Many current editors came to Wikipedia from internet searches - if Wikipedia continues to slide down the search result rankings, our growth may slow. Our userbase and editorship may even decline. It won't matter how good an encyclopedia we've written when all the casual internet user ever finds is some old, incomplete mirror. The longer we wait to fix things, the more deeply entrenched the mirrors become.
Our purpose is to write the best encyclopedia we can, for distribution by ourselves and others, electronically and otherwise. Our purpose isn't to get good Google rankings, high Alexa ratings, or lots of traffic. Compliant mirrors help us in our goal to educate and inform; non-compliant mirrors should be encouraged, pressured, and cajoled into becoming compliant.
If we do directly compete, we shouldn't use their questionable tactics (e.g. putting all the subheadings in the page title as http://encyclopedia.thefreedictionary.com does). We should just do "good" optimisation; if search engines don't report our pages, we should take that as a cue to improve the navigability of our site.
We agree that most internet users are best served by getting Wikipedia content "from the horse's mouth", and Wikipedia certainly needs more visitors and editors. In practice, however, we're constantly limited by our infrastructure. We should therefore try to improve the accessibility and search-engine friendliness of our site without "heroic measures". Instead, improve it in a moderate, controlled way, so that our infrastructure growth can keep pace with the demands placed upon it by our success.
Clearly we should only use legitimate tactics to try to increase our ratings with search engines. These generally mean making sure that our information architecture is clean, our design is simple, well laid out and easy to navigate. All of these are good goals we should be working on in any case.
Currently, the meta tags added to Wikipedia articles list ten arbitrarily chosen internal links from their article. For example, this article currently includes:
The meta tags for Wikipedia include:
The keywords are usually alphabetized, but, as with this article, there are some exceptions. The keywords generally exclude most of the alphabet. No effort has been made to ensure links containing commas are treated as multiple keywords (as they are recognized by browsers);
Rex, North Carolina has been inserted as one keyword, but will be interpreted as the two keywords
Rex and
North Carolina (this is probably beneficial, however). This leads to poor optimization if the components of a list are also linked separately:
Edits which neither add nor remove links often change an article's keywords, with unpredictable results.
The first paragraph (or perhaps first sentence) of an article could be included as the value of a "description" meta tag.
For example including http://www.geourl.org headers where appropriate.
Is there a list of standards for meta tags anywhere, or is this pretty much ad hoc? anthony (see warning) 14:48, 11 Aug 2004 (UTC)
Note, however, that Google is supposed to ignore meta tags completely. -- The Anome 14:58, 9 Aug 2004 (UTC)
How about, it's legal to mirror WP, but we do not have to make it easy for the 'non-compliant' ones. There could be a blacklist of the IPs of the bots they use. To these bots, rather than blocking them (which would be pointless), feed an additional statement "the original page is here", with a link to WP. Now I know that some of these mirrors filter out anything even containing the string "wiki". In order for the message not simply being filtered out, the statement would have to be rephrased/replaced in irregular intervals. Also, the link to WP could be a link to a naked IP, or to a rather domain with a friendly sysop who is prepared to place a redirect (so the links will not be recognized by the mirrors as WP-related). This would keep them on their toes if they really don't want to acknowledge that they are just mirroring. dab 13:45, 27 Nov 2004 (UTC)
Currently, if someone links from, say, English Wikipedia to Japanese Wikipedia with the same version of the article, all you'll get is a link that says: 日本語. What we should do is have the link say 日本語 - フィクション (language - title of article in the language). More contextual, and helps boost rank for that title via context-sensitive links.
It's at least enlightening, if not useful, to note some of the methods clones have used to increase their search rankings that are either technically unfeasible for Wikipedia, or undesirable for other reasons.
Please make sure these are facts
We don't yet know how option 1 might be gone about. For instance, I am not sure that becoming a spamdexer ourselves is a great idea, but using contacts that Jimbo, for example, has with Google, might be extremely helpful if we want to change the results from that search engine.
I've tried to do my own small part for this. I hit "random page" a few times and googled the title of the result. I've charted the results below:
I think it's safe to say that fact one is largely true. Meelar (talk) 19:34, 2004 Aug 3 (UTC)
Another example. Quantum circuit. I couldn't find the wikipedia article anywhere on Google, but the WordIQ clone was number 3. It sucked. None of the TeX was rendered. CSTAR 19:02, 9 Aug 2004 (UTC)
How about we let Google handle this: each person can go and verify the listing details for a handful of pages, and use the complaint link for each one where Wikipedia ranks lower than the clones. If a lot of people did this, then perhaps Google might sit up and take notice of this. Dysprosia 10:00, 4 Aug 2004 (UTC)
Look at these search results: [1]. Wikipedia comes up first in the results, but there is no blurb or summary... If I were a user, I would think that that was a paid search result. The wikipedia clones come up with normal blurbs. What is causing this? - DropDeadGorgias (talk) 15:45, Aug 7, 2004 (UTC)
I entirely agree about the annoyance of the clones. Perhaps Wikipedia should introduce a formal notice to wikipedians advising them to link to Wikipedia (and to their own articles within) from their own personal websites. I believe this is one of the best ways for a site to improve its Google rankings. -- mervyn 09:02, 9 Aug 2004 (UTC)
I agree that allmost all clones don't give anything back, but there are interesting counterexamples. One is this mirror of the German wikipedia. The site operators don't use Mediawiki, but wrote their own software, including a search engine (in Free Pascal) which they claim is much faster and more powerful than WP's mySQL search. And they are offering to release it under the GPL for Wikipedia to use it.
Others at least add useful functionality which might inspire WP back. For example encyclopedia.thefreedictionary.com uses the nice idea of displaying a part of a linked article as a tool-tip when the mouse cursor hovers over the link. regards, High on a tree 01:30, 25 Aug 2004 (UTC)
Many of the other licensees don't give much back but I wouldn't be surprised if some have provided donations or good ideas which we can learn from. Their producers may also be contributors here. They do all provide some load offloading from us and a useful resource for the times when we're unavailable. Jamesday 21:38, 8 Sep 2004 (UTC)
I feel that I've given something back with the 6,556 edits I've made to the main namespace, many of which were based on scripts I've run on my local copy of the database (see also User:Anthony DiPierro/Broken categories for something I've created based on that local copy). I think there's a lot of room for the mirrors and Wikipedia to work hand in hand. It bothers me that so many people see the mirrors as competition. I have no intention of competing with Wikipedia. In fact, I think Wikipedia should run a mirror of itself. anthony 警告 23:24, 23 Nov 2004 (UTC)
Why are clones bad? One reason: they reduce the number of potential editors per page view of Wikipedia material. We could solve this if we persuaded the clones to include links to the Wikipedia editing page. It would seem to be in their interest, because it would lead to improved content. -- erauch 04:53, 28 Aug 2004 (UTC)
Not that I know anything but... is it possible that the fact that the mirrors link to Wikipedia cause a reverse effect of a Google bomb since it is exactly the same content? Someone stated before that because the Wikipedia article is exactly the same, it turned up to be a similar page. Is it possible that because it is turning up as a similar page so often that it is pushed below the PageRank? I'm only exploring an idea... don't know if this is true. -- AllyUnion (talk) 06:43, 1 Dec 2004 (UTC)
As Wikipedia content proliferates, Google users are going to get more and more annoyed when they do a search and find 15 URLs of cloned material in the top 30 results. As a result, Google will have no choice but to fix this problem eventually (and I doubt that their fix could be anything except to push WP clones far down in the rankings). Moreover, in the long run external sites are going to link preferentially to Wikipedia, which will push its rankings up. Until then, relax and remember that imitation is the sincerest form of flattery. —Steven G. Johnson 21:56, Dec 3, 2004 (UTC)
This page is currently inactive and is retained for
historical reference. Either the page is no longer relevant or consensus on its purpose has become unclear. To revive discussion, seek broader input via a forum such as the village pump. |
There are now a large number of clones of Wikipedia's content on the World Wide Web, with various degrees of licence compliance. This is fine if they are in compliance with the GFDL; indeed, it was one of the original goals of the project. Some of them are high-quality mirrors, whilst others are poor-quality bags of spamdexing search-engine fodder.
What seems clear is that many of these clones are using search engine optimization techniques to achieve higher rankings on search engines than the original Wikipedia pages.
The question is: should we start to try to compete with these sites in terms of search engine rankings?
Wikipedia needs to keep its traffic up to maintain its momentum, to continue growing (both in breadth and depth) its userbase, editorbase, and content. Most mirrors are significantly out of date. The majority fail to properly comply with the GFDL. Some wrap Wikipedia's content with questionable ads or associate it with content that doesn't match our high NPOV standards. Many selectively include articles that support their agenda and omit those that don't. More traffic brings us more editors - we're still very short of editors with knowledge outside the western world, of medicine, fine arts, the "soft" sciences and in many other areas. Many current editors came to Wikipedia from internet searches - if Wikipedia continues to slide down the search result rankings, our growth may slow. Our userbase and editorship may even decline. It won't matter how good an encyclopedia we've written when all the casual internet user ever finds is some old, incomplete mirror. The longer we wait to fix things, the more deeply entrenched the mirrors become.
Our purpose is to write the best encyclopedia we can, for distribution by ourselves and others, electronically and otherwise. Our purpose isn't to get good Google rankings, high Alexa ratings, or lots of traffic. Compliant mirrors help us in our goal to educate and inform; non-compliant mirrors should be encouraged, pressured, and cajoled into becoming compliant.
If we do directly compete, we shouldn't use their questionable tactics (e.g. putting all the subheadings in the page title as http://encyclopedia.thefreedictionary.com does). We should just do "good" optimisation; if search engines don't report our pages, we should take that as a cue to improve the navigability of our site.
We agree that most internet users are best served by getting Wikipedia content "from the horse's mouth", and Wikipedia certainly needs more visitors and editors. In practice, however, we're constantly limited by our infrastructure. We should therefore try to improve the accessibility and search-engine friendliness of our site without "heroic measures". Instead, improve it in a moderate, controlled way, so that our infrastructure growth can keep pace with the demands placed upon it by our success.
Clearly we should only use legitimate tactics to try to increase our ratings with search engines. These generally mean making sure that our information architecture is clean, our design is simple, well laid out and easy to navigate. All of these are good goals we should be working on in any case.
Currently, the meta tags added to Wikipedia articles list ten arbitrarily chosen internal links from their article. For example, this article currently includes:
The meta tags for Wikipedia include:
The keywords are usually alphabetized, but, as with this article, there are some exceptions. The keywords generally exclude most of the alphabet. No effort has been made to ensure links containing commas are treated as multiple keywords (as they are recognized by browsers);
Rex, North Carolina has been inserted as one keyword, but will be interpreted as the two keywords
Rex and
North Carolina (this is probably beneficial, however). This leads to poor optimization if the components of a list are also linked separately:
Edits which neither add nor remove links often change an article's keywords, with unpredictable results.
The first paragraph (or perhaps first sentence) of an article could be included as the value of a "description" meta tag.
For example including http://www.geourl.org headers where appropriate.
Is there a list of standards for meta tags anywhere, or is this pretty much ad hoc? anthony (see warning) 14:48, 11 Aug 2004 (UTC)
Note, however, that Google is supposed to ignore meta tags completely. -- The Anome 14:58, 9 Aug 2004 (UTC)
How about, it's legal to mirror WP, but we do not have to make it easy for the 'non-compliant' ones. There could be a blacklist of the IPs of the bots they use. To these bots, rather than blocking them (which would be pointless), feed an additional statement "the original page is here", with a link to WP. Now I know that some of these mirrors filter out anything even containing the string "wiki". In order for the message not simply being filtered out, the statement would have to be rephrased/replaced in irregular intervals. Also, the link to WP could be a link to a naked IP, or to a rather domain with a friendly sysop who is prepared to place a redirect (so the links will not be recognized by the mirrors as WP-related). This would keep them on their toes if they really don't want to acknowledge that they are just mirroring. dab 13:45, 27 Nov 2004 (UTC)
Currently, if someone links from, say, English Wikipedia to Japanese Wikipedia with the same version of the article, all you'll get is a link that says: 日本語. What we should do is have the link say 日本語 - フィクション (language - title of article in the language). More contextual, and helps boost rank for that title via context-sensitive links.
It's at least enlightening, if not useful, to note some of the methods clones have used to increase their search rankings that are either technically unfeasible for Wikipedia, or undesirable for other reasons.
Please make sure these are facts
We don't yet know how option 1 might be gone about. For instance, I am not sure that becoming a spamdexer ourselves is a great idea, but using contacts that Jimbo, for example, has with Google, might be extremely helpful if we want to change the results from that search engine.
I've tried to do my own small part for this. I hit "random page" a few times and googled the title of the result. I've charted the results below:
I think it's safe to say that fact one is largely true. Meelar (talk) 19:34, 2004 Aug 3 (UTC)
Another example. Quantum circuit. I couldn't find the wikipedia article anywhere on Google, but the WordIQ clone was number 3. It sucked. None of the TeX was rendered. CSTAR 19:02, 9 Aug 2004 (UTC)
How about we let Google handle this: each person can go and verify the listing details for a handful of pages, and use the complaint link for each one where Wikipedia ranks lower than the clones. If a lot of people did this, then perhaps Google might sit up and take notice of this. Dysprosia 10:00, 4 Aug 2004 (UTC)
Look at these search results: [1]. Wikipedia comes up first in the results, but there is no blurb or summary... If I were a user, I would think that that was a paid search result. The wikipedia clones come up with normal blurbs. What is causing this? - DropDeadGorgias (talk) 15:45, Aug 7, 2004 (UTC)
I entirely agree about the annoyance of the clones. Perhaps Wikipedia should introduce a formal notice to wikipedians advising them to link to Wikipedia (and to their own articles within) from their own personal websites. I believe this is one of the best ways for a site to improve its Google rankings. -- mervyn 09:02, 9 Aug 2004 (UTC)
I agree that allmost all clones don't give anything back, but there are interesting counterexamples. One is this mirror of the German wikipedia. The site operators don't use Mediawiki, but wrote their own software, including a search engine (in Free Pascal) which they claim is much faster and more powerful than WP's mySQL search. And they are offering to release it under the GPL for Wikipedia to use it.
Others at least add useful functionality which might inspire WP back. For example encyclopedia.thefreedictionary.com uses the nice idea of displaying a part of a linked article as a tool-tip when the mouse cursor hovers over the link. regards, High on a tree 01:30, 25 Aug 2004 (UTC)
Many of the other licensees don't give much back but I wouldn't be surprised if some have provided donations or good ideas which we can learn from. Their producers may also be contributors here. They do all provide some load offloading from us and a useful resource for the times when we're unavailable. Jamesday 21:38, 8 Sep 2004 (UTC)
I feel that I've given something back with the 6,556 edits I've made to the main namespace, many of which were based on scripts I've run on my local copy of the database (see also User:Anthony DiPierro/Broken categories for something I've created based on that local copy). I think there's a lot of room for the mirrors and Wikipedia to work hand in hand. It bothers me that so many people see the mirrors as competition. I have no intention of competing with Wikipedia. In fact, I think Wikipedia should run a mirror of itself. anthony 警告 23:24, 23 Nov 2004 (UTC)
Why are clones bad? One reason: they reduce the number of potential editors per page view of Wikipedia material. We could solve this if we persuaded the clones to include links to the Wikipedia editing page. It would seem to be in their interest, because it would lead to improved content. -- erauch 04:53, 28 Aug 2004 (UTC)
Not that I know anything but... is it possible that the fact that the mirrors link to Wikipedia cause a reverse effect of a Google bomb since it is exactly the same content? Someone stated before that because the Wikipedia article is exactly the same, it turned up to be a similar page. Is it possible that because it is turning up as a similar page so often that it is pushed below the PageRank? I'm only exploring an idea... don't know if this is true. -- AllyUnion (talk) 06:43, 1 Dec 2004 (UTC)
As Wikipedia content proliferates, Google users are going to get more and more annoyed when they do a search and find 15 URLs of cloned material in the top 30 results. As a result, Google will have no choice but to fix this problem eventually (and I doubt that their fix could be anything except to push WP clones far down in the rankings). Moreover, in the long run external sites are going to link preferentially to Wikipedia, which will push its rankings up. Until then, relax and remember that imitation is the sincerest form of flattery. —Steven G. Johnson 21:56, Dec 3, 2004 (UTC)