Should any prose that smells of copyvio be presumptively removed? I have already found one definite and three possibles in a fairly small sample size and I think that with the potential scale of the problem presumptive removal would speed things up a little bit. Boissière ( talk) 21:56, 4 September 2010 (UTC)
I just saw this report on ANI and thought I'd see if you'd like some help. I've never gotten involved here so I'm unsure as to how this works, procedurally-speaking. Should I claim an article in the list somehow? I'm guessing the x graphics means no copyright issues found. What happens if I do find something plagiarized? How does it get tagged, and is there somewhere else that would be reported? Sorry for so many questions, but I want to make sure I'm going about it properly before I jump right in, so I don't end up creating even more work for someone. — e. ripley\ talk 04:36, 5 September 2010 (UTC)
In Cleanup instructions you note that All contributors with no history of copyright problems are welcome to contribute to clean up. I had in the past an issue related to copyright problems mainly due to misunderstandings, which was finally cleared. Would I be allowed to help here, or not? Rentzepopoulos ( talk) 13:01, 20 September 2010 (UTC)
This evening I have been trying to develop an API program which would take the wikitext of a suspect article and try to count up the amount of prose in it. It does this by dividing the article into sections and counting the words in each section. A section is principally either a normal section between two headings or a cell in a table. The program then reports the largest section. This way an article consisting mainly of tables should return a low value. Here is what it produces for Articles 61 through 80 (I chose this because this has a reported but not yet cleaned copyvio in Athletics at the 1980 Summer Olympics – Men's 3000 metre steeplechase).
The program needs refinement - in 2009 Vuelta a Colombia it is being fooled by the list of teams near the end - I need to work out how to spot that. You can see that the copyvio article mentioned has a word count of 212. Is this an approach worth pursuing further? Boissière ( talk) 22:51, 5 September 2010 (UTC)
I know that results and statistics themselves aren't copyrightable, but is there anything copyrightable in the specific format and wording in which they are presented? I ask in relation to this comment I made on the user's Talk page. -- Boing! said Zebedee ( talk) 17:35, 11 September 2010 (UTC)
A certain amount of this copywio may have spilled over to other language versions through translations. Will a list of confirmed copyvio be kept here, and are there any ideas about how this particular problem could be checked and handled? -- Sir48 ( talk) 13:58, 14 September 2010 (UTC)
This revision has been marked by a user [1] as copyvio on here [2]. Need explanation on why this is so. 121.120.214.122 ( talk) 15:42, 14 September 2010 (UTC)
I've been focusing my efforts Darius's biographies and I've come to believe that DD was using a bot of some sort to create articles. I think this for several reasons. One is that many of them have this "fill in the blank" quality. They almost always include the athlete's gender and the phrase his [or her] native country. The articles' spelling tends to be consistent with that of their sources. If the source uses the British way of spelling things (i.e. metres instead of meters) then so does Darius. If its an American source then he uses the American spelling.-- *Kat* ( talk) 01:26, 21 September 2010 (UTC)
On Wikipedia:Contributor copyright investigations/Darius Dhlomo 24, I went to the first article listed, which was Vasily Rudenkov. It lists one edit, but checking the Revision history of Vasily Rudenkov shows two. Neither was an issue, but I wanted to make sure people know they have to examine the article history to make sure... dm ( talk) 02:01, 21 September 2010 (UTC)
Once the bot has blanked an article, and the article is subsequently sorted out, is it still helpful to update pages such as Wikipedia:Contributor copyright investigations/Darius Dhlomo 8? My guess is that once articles have been tagged it's no longer necessary, but I just wanted to be sure in case not doing so creates extra work for someone else down the line. Regards -- W F C-- 10:00, 23 September 2010 (UTC)
![]() |
This user assisted with with the largest CCI cleanup in the history of Wikipedia. |
Current statistics. Just counting the number of {{ y}} and {{ t}} templates, so exact numbers will be slightly off.
Based on this, I would say 20% of the checked articles are copyright violations. Probably in the whole set the number is lower.
I've read calculations like this before, but I wanted to know the current status, so I calculated it and decided to share it.-- EdgeNavidad ( Talk · Contribs) 20:52, 13 November 2010 (UTC)
I just stumbled across this. If the cleanup is ongoing (and it looks like it is), I'd be happy to help out. I don't have any experience in a WP project like this, so I'll need someone to show me where the (metaphorical) mops are kept and how to use one. - Ornithopter ( talk) 07:06, 4 August 2012 (UTC)
This article was created by DD. In the info box, it is stated (info entered by original creator DD) that she is 6 feet 7 inches tall, and weighs 200 pounds. This is absurd and wrong. If you don't believe me, Google her name and check under images. You will find photos of her on podiums and she is obviously not 6' 7" tall, etc. How much other mis-information of this type is there in article created/edited by DD? — Preceding unsigned comment added by 71.227.178.154 ( talk) 03:06, 20 October 2012 (UTC)
Should any prose that smells of copyvio be presumptively removed? I have already found one definite and three possibles in a fairly small sample size and I think that with the potential scale of the problem presumptive removal would speed things up a little bit. Boissière ( talk) 21:56, 4 September 2010 (UTC)
I just saw this report on ANI and thought I'd see if you'd like some help. I've never gotten involved here so I'm unsure as to how this works, procedurally-speaking. Should I claim an article in the list somehow? I'm guessing the x graphics means no copyright issues found. What happens if I do find something plagiarized? How does it get tagged, and is there somewhere else that would be reported? Sorry for so many questions, but I want to make sure I'm going about it properly before I jump right in, so I don't end up creating even more work for someone. — e. ripley\ talk 04:36, 5 September 2010 (UTC)
In Cleanup instructions you note that All contributors with no history of copyright problems are welcome to contribute to clean up. I had in the past an issue related to copyright problems mainly due to misunderstandings, which was finally cleared. Would I be allowed to help here, or not? Rentzepopoulos ( talk) 13:01, 20 September 2010 (UTC)
This evening I have been trying to develop an API program which would take the wikitext of a suspect article and try to count up the amount of prose in it. It does this by dividing the article into sections and counting the words in each section. A section is principally either a normal section between two headings or a cell in a table. The program then reports the largest section. This way an article consisting mainly of tables should return a low value. Here is what it produces for Articles 61 through 80 (I chose this because this has a reported but not yet cleaned copyvio in Athletics at the 1980 Summer Olympics – Men's 3000 metre steeplechase).
The program needs refinement - in 2009 Vuelta a Colombia it is being fooled by the list of teams near the end - I need to work out how to spot that. You can see that the copyvio article mentioned has a word count of 212. Is this an approach worth pursuing further? Boissière ( talk) 22:51, 5 September 2010 (UTC)
I know that results and statistics themselves aren't copyrightable, but is there anything copyrightable in the specific format and wording in which they are presented? I ask in relation to this comment I made on the user's Talk page. -- Boing! said Zebedee ( talk) 17:35, 11 September 2010 (UTC)
A certain amount of this copywio may have spilled over to other language versions through translations. Will a list of confirmed copyvio be kept here, and are there any ideas about how this particular problem could be checked and handled? -- Sir48 ( talk) 13:58, 14 September 2010 (UTC)
This revision has been marked by a user [1] as copyvio on here [2]. Need explanation on why this is so. 121.120.214.122 ( talk) 15:42, 14 September 2010 (UTC)
I've been focusing my efforts Darius's biographies and I've come to believe that DD was using a bot of some sort to create articles. I think this for several reasons. One is that many of them have this "fill in the blank" quality. They almost always include the athlete's gender and the phrase his [or her] native country. The articles' spelling tends to be consistent with that of their sources. If the source uses the British way of spelling things (i.e. metres instead of meters) then so does Darius. If its an American source then he uses the American spelling.-- *Kat* ( talk) 01:26, 21 September 2010 (UTC)
On Wikipedia:Contributor copyright investigations/Darius Dhlomo 24, I went to the first article listed, which was Vasily Rudenkov. It lists one edit, but checking the Revision history of Vasily Rudenkov shows two. Neither was an issue, but I wanted to make sure people know they have to examine the article history to make sure... dm ( talk) 02:01, 21 September 2010 (UTC)
Once the bot has blanked an article, and the article is subsequently sorted out, is it still helpful to update pages such as Wikipedia:Contributor copyright investigations/Darius Dhlomo 8? My guess is that once articles have been tagged it's no longer necessary, but I just wanted to be sure in case not doing so creates extra work for someone else down the line. Regards -- W F C-- 10:00, 23 September 2010 (UTC)
![]() |
This user assisted with with the largest CCI cleanup in the history of Wikipedia. |
Current statistics. Just counting the number of {{ y}} and {{ t}} templates, so exact numbers will be slightly off.
Based on this, I would say 20% of the checked articles are copyright violations. Probably in the whole set the number is lower.
I've read calculations like this before, but I wanted to know the current status, so I calculated it and decided to share it.-- EdgeNavidad ( Talk · Contribs) 20:52, 13 November 2010 (UTC)
I just stumbled across this. If the cleanup is ongoing (and it looks like it is), I'd be happy to help out. I don't have any experience in a WP project like this, so I'll need someone to show me where the (metaphorical) mops are kept and how to use one. - Ornithopter ( talk) 07:06, 4 August 2012 (UTC)
This article was created by DD. In the info box, it is stated (info entered by original creator DD) that she is 6 feet 7 inches tall, and weighs 200 pounds. This is absurd and wrong. If you don't believe me, Google her name and check under images. You will find photos of her on podiums and she is obviously not 6' 7" tall, etc. How much other mis-information of this type is there in article created/edited by DD? — Preceding unsigned comment added by 71.227.178.154 ( talk) 03:06, 20 October 2012 (UTC)