From Wikipedia, the free encyclopedia
  • An interesting project. One immediate concern is that articles might be rated on how individual readers feel about the subject of the article, rather than the quality of its content. Jezhotwells ( talk) 19:54, 13 September 2010 (UTC) reply
  • While certainly better than the current rating system of... peer review, I personally don't think it can stand the test of time in the sense that articles can change quickly as new information becomes available or, say, if a single Wikipedian decides to take matters into his own hands. I'm not very sure how well an automated system would work well for this task, most of all. But this will be a Wikipedia-wide (yet optional, although this wasn't clarified on too much) switchover, and we will all stumble upon it at some point. My biggest question is: Is this update for the US only? -- Γιάννης Α. | 20:03, 13 September 2010 (UTC) reply
If you're referring to the Reader Assessment Tool, it's not automated; it presents aggregate scores based on all the ratings that individual users have done. As for the test of time, that's definitely a big challenge, figuring out how to deal with stale ratings. For the time being, I think the developers are things like a time-based half-life for ratings or a number-of-intervening-edits based half-life for ratings. It might be possible down the line to do better with a tool that compares how much of the current text is present in earlier rated versions to determine how much weight old ratings get. This will not be Wikipedia-wide at this point, it will only be available for articles in WikiProject United States Public Policy. This is basically a technology test and a conversation starter, at this point. But it will definitely be available to whichever wikis want it once the technology stabilizes.-- Sage Ross - Online Faciliator, Wikimedia Foundation ( talk) 21:08, 13 September 2010 (UTC) reply
  • On controversial articles someone will try to game this system but better they waste their time trying to game the article assessment than trying to game the article content. When 4Chan gets Goatse rated as the best article on Wikipedia we can all have a chuckle but it won't have any other effect. This tool looks like it might be useful for other sorts of automated polling of readers. We could add questions on age, income, education level, language fluency to the quiz and get some correllation going. Different readers could be asked different questions so the questionnaire doesn't get too long. When you start asking that level of detailed question however the question on anonymity comes up. Does this tool record the IP of each respondent? filceolaire ( talk) 21:25, 13 September 2010 (UTC) reply
  • I think the rating scale is indeed better in the aspects mentioned in the text, but I would suggest it to be simplified with a 5-point scale for each measure. That would make it more intuitive (worst-bad-average-good-best), it would relate directly to the star rating from the Article Feedback Tool, and the template could still calculate the overall score by giving different weights to each measure. As it is right now, in includes the same kind of learning curve than the 1.0 system (note how both can be partially solved with the use of html comments, but fail on their absence). -- Waldir talk 11:37, 14 September 2010 (UTC) reply
Yeah, it's a tough needle to thread. A 5-point scale for each factor has two downsides: there aren't enough points to differentiate between every class for comprehensiveness (Stub, Start, C, B, GA, and A/FA all have different requirements for that), and it risks implying that every factor is equally important. But I agree with the advantages you point out. Whether they outweigh the downsides, I'm not sure. Personally, I hope the Article Feedback Tool will evolve in a more reader-oriented direction, because I don't think readers think about article quality in the same terms as editors. For Article Feedback Tool, I'd like to see something like a single 5-star rating for the whole article, then a question like "Did you find the information you were looking for? [yes, some, no]" and an input box to leave comments.-- Sage Ross - Online Faciliator, Wikimedia Foundation ( talk) 12:46, 14 September 2010 (UTC) reply
I thought the 1.0 correspondence was given by a weighted average of all the components, not only from the comprehensiveness scale. In that sense, I don't see why a 5-point scale invalidates the inference of the 1.0 rating, but I probably am interpreting the conversion the wrong way. As for the implication that all factors are equally important, that imo doesn't sound as much of a problem. And even if it did that, I don't see how it would affect the assessment. -- Waldir talk 19:38, 14 September 2010 (UTC) reply

I don't think it makes sense to have a complicated system of assessment. An editor who is experienced in an area can eyeball an article and tell you whether it is Start, or C or B. As someone said above, every article is a moving target anyway, so why worry so much about assessements. So what if a C-class article gets grade-inflated to B: It still needs careful work to be ready for GA. IMO, editors spending all this time on assessment ought to be researching and writing instead. -- Ssilvers ( talk) 21:15, 14 September 2010 (UTC) reply

I agree; there's no point in devising an assessment system so complicated that it takes significant time away from editing. Especially since we don't really know how much article assessment helps towards article improvement. Lampman ( talk) 04:43, 15 September 2010 (UTC) reply
Of course, writing is a more important task than assessing. But effort and activity on Wikipedia is not very fungible; different people put their energy into different things, and we can't just transfer that energy from one area to another (for the most part). More detailed assessments (especially optional ones like this, where the simpler version is always an option) provide an opportunity to a) give a more accurate indication of an article's quality, which is important for things like creating offline versions and b) give editors a more specific indication of how an article can be improved.
In this case, we also need to do measurements of article quality as part of the requirements of the Public Policy Initiative grant, which Amy Roth determined would be impractical without a more quantifiable assessment system.-- Sage Ross - Online Facilitator, Wikimedia Foundation ( talk) 14:48, 15 September 2010 (UTC) reply
What I like to do with assessments is to leave a list of suggestions for improvement on the talk page, like I did today at Talk:Kerry Ellis. -- Ssilvers ( talk) 05:22, 16 September 2010 (UTC) reply
I agree with the comment above, that it won't prevent Wikipedia:Gaming the system. But it might ease tension a bit in cases where determined POV pushers repeately delete the "NPOV dispute" tag from articles which they are censoring or otherwise distorting.
It will work best on articles which aren't the target of edit wars. -- Uncle Ed ( talk) 20:03, 18 September 2010 (UTC) reply
  • I really like the idea of "group-sourcing" the quality of an article with this mechanism. If I may, there does need to be some sort of counter n = X which shows how many votes have been received to generate the ratings... 20,000 responses means more than 2 responses that way. There also needs to be some protection against "revoting," which will inevitably happen in contentious articles as a sort of plebiscite on whether the reader approves of the content... Still, this is a really good step and I hope there comes a day in the not too distant future when all Wikipedia articles have a sort of "group-sourced" feedback section. Carrite ( talk) 16:27, 20 September 2010 (UTC) reply
I just thought of a problem. Wikipedia articles generally start small and weak and get bigger and better over time. Yet an article can accumulate ratings for years in its small, weak state — then be improved — and still be saddled with obsolete "old" ratings. There needs to be some sort of a reset mechanism for massively expanded articles or some sort of automatic elimination of ratings more than, let's say, a year old to keep the ratings more or less as fresh as the article. Carrite ( talk) 16:31, 20 September 2010 (UTC) reply
Yep. That's one of the big challenges that the developers are thinking about, how to deal with stale ratings. Hopefully, once people get some experience with how the ratings work during this pilot, we can come up with some ideas for dealing with that problem effectively.-- Sage Ross - Online Facilitator, Wikimedia Foundation ( talk) 16:36, 20 September 2010 (UTC) reply
Without considering technical matters, it's pretty easy to know when an article has mostly likely moved out of stub or start status, simply by looking at length and number of footnotes. A 1000+ word article with 5+ footnotes can't possibly be a stub; a 1500+ word article with 10+ footnotes is almost certainly "C" class (or better) rather than "start class". I'm not arguing here for machine-grading; rather. Rather, it seems clear that it's easy for a computer to determine that a specific, older rating should be discarded because an article has changed sufficiently since that particular rating was done. -- John Broughton (♫♫) 19:31, 20 September 2010 (UTC) reply
From Wikipedia, the free encyclopedia
  • An interesting project. One immediate concern is that articles might be rated on how individual readers feel about the subject of the article, rather than the quality of its content. Jezhotwells ( talk) 19:54, 13 September 2010 (UTC) reply
  • While certainly better than the current rating system of... peer review, I personally don't think it can stand the test of time in the sense that articles can change quickly as new information becomes available or, say, if a single Wikipedian decides to take matters into his own hands. I'm not very sure how well an automated system would work well for this task, most of all. But this will be a Wikipedia-wide (yet optional, although this wasn't clarified on too much) switchover, and we will all stumble upon it at some point. My biggest question is: Is this update for the US only? -- Γιάννης Α. | 20:03, 13 September 2010 (UTC) reply
If you're referring to the Reader Assessment Tool, it's not automated; it presents aggregate scores based on all the ratings that individual users have done. As for the test of time, that's definitely a big challenge, figuring out how to deal with stale ratings. For the time being, I think the developers are things like a time-based half-life for ratings or a number-of-intervening-edits based half-life for ratings. It might be possible down the line to do better with a tool that compares how much of the current text is present in earlier rated versions to determine how much weight old ratings get. This will not be Wikipedia-wide at this point, it will only be available for articles in WikiProject United States Public Policy. This is basically a technology test and a conversation starter, at this point. But it will definitely be available to whichever wikis want it once the technology stabilizes.-- Sage Ross - Online Faciliator, Wikimedia Foundation ( talk) 21:08, 13 September 2010 (UTC) reply
  • On controversial articles someone will try to game this system but better they waste their time trying to game the article assessment than trying to game the article content. When 4Chan gets Goatse rated as the best article on Wikipedia we can all have a chuckle but it won't have any other effect. This tool looks like it might be useful for other sorts of automated polling of readers. We could add questions on age, income, education level, language fluency to the quiz and get some correllation going. Different readers could be asked different questions so the questionnaire doesn't get too long. When you start asking that level of detailed question however the question on anonymity comes up. Does this tool record the IP of each respondent? filceolaire ( talk) 21:25, 13 September 2010 (UTC) reply
  • I think the rating scale is indeed better in the aspects mentioned in the text, but I would suggest it to be simplified with a 5-point scale for each measure. That would make it more intuitive (worst-bad-average-good-best), it would relate directly to the star rating from the Article Feedback Tool, and the template could still calculate the overall score by giving different weights to each measure. As it is right now, in includes the same kind of learning curve than the 1.0 system (note how both can be partially solved with the use of html comments, but fail on their absence). -- Waldir talk 11:37, 14 September 2010 (UTC) reply
Yeah, it's a tough needle to thread. A 5-point scale for each factor has two downsides: there aren't enough points to differentiate between every class for comprehensiveness (Stub, Start, C, B, GA, and A/FA all have different requirements for that), and it risks implying that every factor is equally important. But I agree with the advantages you point out. Whether they outweigh the downsides, I'm not sure. Personally, I hope the Article Feedback Tool will evolve in a more reader-oriented direction, because I don't think readers think about article quality in the same terms as editors. For Article Feedback Tool, I'd like to see something like a single 5-star rating for the whole article, then a question like "Did you find the information you were looking for? [yes, some, no]" and an input box to leave comments.-- Sage Ross - Online Faciliator, Wikimedia Foundation ( talk) 12:46, 14 September 2010 (UTC) reply
I thought the 1.0 correspondence was given by a weighted average of all the components, not only from the comprehensiveness scale. In that sense, I don't see why a 5-point scale invalidates the inference of the 1.0 rating, but I probably am interpreting the conversion the wrong way. As for the implication that all factors are equally important, that imo doesn't sound as much of a problem. And even if it did that, I don't see how it would affect the assessment. -- Waldir talk 19:38, 14 September 2010 (UTC) reply

I don't think it makes sense to have a complicated system of assessment. An editor who is experienced in an area can eyeball an article and tell you whether it is Start, or C or B. As someone said above, every article is a moving target anyway, so why worry so much about assessements. So what if a C-class article gets grade-inflated to B: It still needs careful work to be ready for GA. IMO, editors spending all this time on assessment ought to be researching and writing instead. -- Ssilvers ( talk) 21:15, 14 September 2010 (UTC) reply

I agree; there's no point in devising an assessment system so complicated that it takes significant time away from editing. Especially since we don't really know how much article assessment helps towards article improvement. Lampman ( talk) 04:43, 15 September 2010 (UTC) reply
Of course, writing is a more important task than assessing. But effort and activity on Wikipedia is not very fungible; different people put their energy into different things, and we can't just transfer that energy from one area to another (for the most part). More detailed assessments (especially optional ones like this, where the simpler version is always an option) provide an opportunity to a) give a more accurate indication of an article's quality, which is important for things like creating offline versions and b) give editors a more specific indication of how an article can be improved.
In this case, we also need to do measurements of article quality as part of the requirements of the Public Policy Initiative grant, which Amy Roth determined would be impractical without a more quantifiable assessment system.-- Sage Ross - Online Facilitator, Wikimedia Foundation ( talk) 14:48, 15 September 2010 (UTC) reply
What I like to do with assessments is to leave a list of suggestions for improvement on the talk page, like I did today at Talk:Kerry Ellis. -- Ssilvers ( talk) 05:22, 16 September 2010 (UTC) reply
I agree with the comment above, that it won't prevent Wikipedia:Gaming the system. But it might ease tension a bit in cases where determined POV pushers repeately delete the "NPOV dispute" tag from articles which they are censoring or otherwise distorting.
It will work best on articles which aren't the target of edit wars. -- Uncle Ed ( talk) 20:03, 18 September 2010 (UTC) reply
  • I really like the idea of "group-sourcing" the quality of an article with this mechanism. If I may, there does need to be some sort of counter n = X which shows how many votes have been received to generate the ratings... 20,000 responses means more than 2 responses that way. There also needs to be some protection against "revoting," which will inevitably happen in contentious articles as a sort of plebiscite on whether the reader approves of the content... Still, this is a really good step and I hope there comes a day in the not too distant future when all Wikipedia articles have a sort of "group-sourced" feedback section. Carrite ( talk) 16:27, 20 September 2010 (UTC) reply
I just thought of a problem. Wikipedia articles generally start small and weak and get bigger and better over time. Yet an article can accumulate ratings for years in its small, weak state — then be improved — and still be saddled with obsolete "old" ratings. There needs to be some sort of a reset mechanism for massively expanded articles or some sort of automatic elimination of ratings more than, let's say, a year old to keep the ratings more or less as fresh as the article. Carrite ( talk) 16:31, 20 September 2010 (UTC) reply
Yep. That's one of the big challenges that the developers are thinking about, how to deal with stale ratings. Hopefully, once people get some experience with how the ratings work during this pilot, we can come up with some ideas for dealing with that problem effectively.-- Sage Ross - Online Facilitator, Wikimedia Foundation ( talk) 16:36, 20 September 2010 (UTC) reply
Without considering technical matters, it's pretty easy to know when an article has mostly likely moved out of stub or start status, simply by looking at length and number of footnotes. A 1000+ word article with 5+ footnotes can't possibly be a stub; a 1500+ word article with 10+ footnotes is almost certainly "C" class (or better) rather than "start class". I'm not arguing here for machine-grading; rather. Rather, it seems clear that it's easy for a computer to determine that a specific, older rating should be discarded because an article has changed sufficiently since that particular rating was done. -- John Broughton (♫♫) 19:31, 20 September 2010 (UTC) reply

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook