Essays Low‑impact | ||||||||||
|
Thank you to the folks who put this together. I quickly scanned the analysis, and while I don't agree with a few of its assumptions, it does shine a light on the need to find a balance between thoroughness and speed. I will probably have more comments or questions after I read it more carefully.- Mr X 23:43, 31 May 2017 (UTC)
Thanks so much for this thorough consideration. I look forward to thinking through and discussing together. If I may ask one informational point to begin (forgive me if this should be obvious). As to the key in the graph, saying blue indicates "users who are still new (not autoconfirmed) today". Does blue then indicate only from pages from editors who were still not autoconfirmed when you generated this graph on May 25, or all pages from editors not autoconfirmed at the time of their entry's creation? I.e. if four months ago, a not-yet-autoconfirmed editor made a page; it's still in the backlog; but that person subsequently made ten edits to WP, becoming autoconfirmed: is their backlogged entry marked green or blue in the graph? I ask because this would, of course, change how much the graph tells us about what percentage of the backlog would be affected by requiring autoconfirm at the time the editor creates the page.
Thanks again for the sustained attention to this important challenge. Innisfree987 ( talk) 23:46, 31 May 2017 (UTC)
First, let me say thank you for taking the time to write this and provide us with some numbers. From my first glance through it appears to me that the report is missing what is one of the more important numbers in the discussion: the number of pages created within the last 90 days that have been deleted, and then breaking that out by pages created by autoconfirmed and pages created by non-autoconfirmed. I'm still digesting the report and intend to read it again before making any substantial comments, but I do think that these numbers are critical to the conversation moving forward. TonyBallioni ( talk) 23:53, 31 May 2017 (UTC)
Thanks for this analysis. Your data runs counter to what I assumed the problem has been. I still support WP:ACTRIAL but I also think most people shouldn't be editing Wikipedia, either. I don't think the best solution to "Time-Consuming Judgment Calls" is to let them age-off the list. I understand the logic presented but NPP is our sorting mechanism for problematic content. To my mind it would make more sense to have the "Time-Consuming Judgment Call" articles nominated by a bot (like WP:G13) for deletion after 60 days so that WP:DELSORT can assemble the subject-specific experts to make a determination. Orphaned, dead-end content (if not addressed by NPP) could remain out of sight for years until finally addressed. To that end, I think it advisable to re-write the lede to not include your conclusion, as reading that up-front almost blew all of my buy-in and I was about to write another dismissive denouncement of all of you in San Francisco. I'm glad I read the piece to the end and your presentation makes some difference as to how we proceed, but I could barely stomach that conclusion. Please bury it if you're going to provide it. Chris Troutman ( talk) 00:41, 1 June 2017 (UTC)
I suspect that number is not insignificant, and it doesn't even have to be a majority to be harmful to the encyclopedia if they fall off to not be touched until someone realizes that it was a cleverly designed attack page or a copyright violation years later. The deletion numbers are critical to this conversation because they shed light on what the community has judged not to be acceptable from the recently created pages. TonyBallioni ( talk) 02:33, 1 June 2017 (UTC)
I guess my question, statistically, is whether there is some "critical mass" of views by reviewers that means the the page should probably be "semi-auto-reviewed" because enough people have passed on it, that it is likely to not be nominated for deletion, likely to survive that discussion, and/or likely to survive long enough for someone to find it and improve it? If that is identifiable, it seems preferable to an arbitrary limit like 30 or 60 days. TimothyJosephWood 02:17, 1 June 2017 (UTC)
this user passed on this article, we are likely to have that read in the minds of some,
this user was too damned lazy to do anything about it, rather than
this user did some level of evaluation, and if the article was obviously toxic probably would have taken some action, and at some level, enough "meh" probably amounts to the article not being an "important" part of the backlog, and instead just something that can just as well sit around for a few months before someone starts improving it.
The important thing to note is that adding more reviewers to this system will not make it work better.
While removing pages created by non-autoconfirmed users would reduce the burden on that first wave of reviewers, it would result in the loss of many potential good articles. It would also send a clear message to new Wikipedia editors that their contributions aren't wanted, potentially stunting the growth of the editing community.
The top of the New Pages Feed says, in bold letters, Rather than speed, quality and depth of patrolling and the use of correct CSD criteria are essential to good reviewing.
A reviewer who doesn't spend enough quality time on a given review risks being blocked from reviewing...
The urgent priorities are to:
While I am pleased that the Foundation has finally decided to at least do some preliminary review of what is wrong with our page patrolling system, its taken a very long time, and I am sure that this progress has been achieved partly, though not entirely, through my constant whinging, lobbying, Skype conferences, and personal meetings with the WMF for a year with staff such as Danny Horn, Ryan Kaldari, MusikAnimal, Jonathan Morgan, Aaron Halfaker, Nick Wilson, and recently with Wes Moran (whose name has recently disappeared from the staff list). I would particularly like to thank Kaldari and MusikAnimal who have demonstrated the greatest understanding of the critical situation and have helped where they could within the limitations of their employment while dividing their time with their volunteer activities. Kudpung กุดผึ้ง ( talk) 04:31, 1 June 2017 (UTC)
I'm very appreciative of the WMF for paying attention to this and engaging with us, but the main suggestion in this proposal (the fall-off) is a very bad one in my opinion. The data also needs a second look over because it seems to not be easily explainable and is missing some key numbers. I hope these observations add some value to the conversation. TonyBallioni ( talk) 16:36, 1 June 2017 (UTC)
In terms of the 7% figure, it makes sense, and I appreciated both your and Kaldari's explanations. I think my confusion is that by being presented alongside the backlog numbers it seemed as if that the report was saying that was the proportion of the backlog that would be reduced daily, which it is not. I think the daily backlog percentage is the more important number and would like to see that featured in the report if we can get it.Thank you for responding here. It is very much appreciated. TonyBallioni ( talk) 05:25, 2 June 2017 (UTC)
@ Kudpung: As mentioned, I am getting the impression that your criticism of the backlog graph is based on a misunderstanding about its time axis. The horizontal direction maps the creation date of the unreviewed pages, counting them at a fixed backlog snapshot time (May 25). In contrast, the graph you posted above (and others that track how the size of the backlog has been developing) varies the backlog snapshot time and does not distinguish unreviewed pages by creation date.
Thus, it doesn't make sense to demand that the chart "needs to go back to mid 2016 when the the backlog suddenly began to increase at an alarming rate". In fact, as can be seen from the raw data in the PAWS notebook (see the "queryresult" or "df_pivot" cells), or simply by selecting "unreviewed" and "sort by: oldest" in Special:NewPagesFeed, there are almost no pages in the current backlog dating from before December 2016, which was the pragmatic reason for starting the chart there.
Another way to look at this time axis distinction is that my backlog graph takes a specific value in your graph (for May 25, near its right end) and splits up that total backlog size by article age to get a more detailed understanding of what kind of pages made up the backlog at that point in time. One could even include both dimensions and make a three-dimensional plot: number of unreviewed pages (z-axis) per creation date (x-axis) and backlog snapshot date (y-axis).
Regards, Tbayer (WMF) ( talk) 07:14, 2 June 2017 (UTC)
Wikipedia:New_pages_patrol/Analysis_and_proposal#Non-autoconfirmed_contributors appears to say that very few pages are created by non autoconfirmed users, but then it appears that autoconfirmed status is as measured now, not at the time of page creation. A quick look at the most recent creation tells me that non autoconfirmed new page creations are a lot higher than 7%. Autoconfirmation is a very low bar, and will be very easily met with a little fiddling of the first page. Just wondering about the facts. -- SmokeyJoe ( talk) 05:44, 1 June 2017 (UTC)
Some parts of the the report are vague. At some instances, it is not clear if the report supports the reviewers/patrollers, or if it wants the right to be eliminated. Anyways. I mostly disagree with the section "This system is not sustainable". It says reviewers are generalists, this doesnt mean they dont have an area of expertise at all. If the reviewers are tripled in number, there are very high chances that at least one reviewer would be familiar with that particular field/category/subject with the backlogged article. Or there would be at least one editor from these 1000-1200 reviewers who would say "doesnt matter how long it would take, i will work on it". And in certain cases there is {{ expert needed}} tag. I am pretty sure more than 95% reviewers know about these template as reviewer user-right is not granted easily.
"The only sustainable way to manage the backlog is to reinstitute the expiration date, which the system had from 2007 to 2012. An article that survives the gauntlet of reviewers for a reasonable amount of time – say, 30 days or 60 days – is unlikely to be picked up and fixed by a generalist new page reviewer. Pages that survive past that deadline should be improved by subject matter experts, which is the way that Wikipedia works. With a 30 day expiration, the backlog on May 30th, 2017 would have 5,650 pages instead of 21,800. With 60 day expiration, it would have 10,200 pages."
I think it is prohibited for subject matter experts to touch the page if it is unreviewed. Maybe they get blocked for 24 hours for just clicking the "edit" on an unreviewed page. Why not set the expiration date at 10 days? It will be really good for everybody. This will be exactly like, changing the definition/upper limit of high blood pressure to be able to say "no! The parient doesnt have a high blood pressure." —usernamekiran (talk) 13:50, June 1, 2017 (UTC)
There seems to be a popular misconception that ACTRIAL was only about preventing non confirmed users from creating pages. That perception is completely wrong and discredits those who worked hard to create the ACTRIAL project and the hundreds of users who voted the overwhelming consensus for it. In 2006 by withdrawing the 'right' of IPs to create pages, the Foundation already acknowledged that Wikipedia is organic and that the rules occasionally need to be modified accordingly.
It is erroneous to allow the impression that the current backlog began concomitant with the creation of the New Page Reviewer user right in November. It didn't. This current backlog actually stretches back to mid 2016 (where it was 'only' 5,000). This was a already a grave concern and is what gave rise to the talks in Italy and the run up to the creation of the New Page Reviewer group in November. In fact for a while, until it suddenly started rising dramatically again in February, after the roll out the backlog actually began slowly but surely to diminish.
The Foundation has now given us a page full of comment, which is genuinely very welcome and highly encouraging and I'm sure that those commenting here will read it entirely and carefully, but in order to get properly up to speed, the WMF team should probably also be encouraged to do the community team the courtesy of reading the whole of this page: Wikipedia talk:The future of NPP and AfC. Kudpung กุดผึ้ง ( talk) 15:10, 1 June 2017 (UTC)
There is a lot of information here and I am not following this proposal. Suppose that someone makes a submission and it gets no review after 30 days. After that point, will it move into Wikipedia mainspace in the same way as an article which passed review? Blue Rasberry (talk) 20:08, 1 June 2017 (UTC)
@ Kaldari: nope, it would be a disastrous move. Plenty of good articles would get deleted just because a group of limited users couldnt process it. And bots are dumb after-all. Isnt that right, DumbBOT? (Just nod your head if you want more electricity.)
It is easy to foresee this solution will never get the consensus. —usernamekiran (talk) 22:08, 1 June 2017 (UTC)
It is easy to foresee this solution will never get the consensus.That's probably the one thing we can all agree on. TimothyJosephWood 23:27, 1 June 2017 (UTC)
The WMF's solution is really a head-scratcher to me. They think the problem is "The queue is too big" so the solution is "Just throw stuff out of the queue after 30 days". If the only metric we're going by is length of queue, I have an even better solution. Get rid of the queue entirely and the queue length will be zero! The secondary metric (quality of content) is lacking in this report. Further, they fundamentally misunderstand the guidance given to patrollers. They wrote "Following the Article namespace checklist – the minimum effort that a reviewer is supposed to do – this article would probably take days to fix. You'd have to track down references, most of them not in English, and completely rewrite the page from scratch." That's incorrect. The guidelines for patrollers say to fix easy issues and tag more complicated things, which doesn't take long at all. Same goes for notability. If notability is seriously questionable, tag with questionable notability (or bring it to AfD for more opinions, which is also fine). Detecting issues and fixing issues are very different things with very different time commitments, and so the analysis is flawed. If we have a messaging problem where patrollers think we're asking them to make every article GA-quality, let's talk about that. But if every reviewer followed the guidance given to them, there's no reason to believe articles in need of improvement are driving the backlog. ~ Rob13 Talk 00:06, 2 June 2017 (UTC)
if every reviewer followed the guidance given to themBut they're not. So we can talk about a perfect world all day, but how do we square the de jure with the de facto in a way that in a reasonable logistical sense accomplishes the mission? TimothyJosephWood 00:15, 2 June 2017 (UTC)
if only every reviewer would review X amount of articles the system would workso many times it's starting to make me nauseous. They're not...we're not and we have to ignore the idea that it's a problem with the reviewers because that's not a problem we can solve. So what problem can we solve? TimothyJosephWood 00:20, 2 June 2017 (UTC)
It is completely wrong that "The only sustainable way to manage the backlog is to reinstitute the expiration date" The effect of a cut off date will be to keep articles from ever being reviewed. There is no reason why anything at all should totally drop off the end--if there is any purpose to reviewing, everything needs review. All that having an artificial cut off does it prevents us from realizing the extent of the backlog, by deceiving us into not seeing it. Having such a date is the basic recommendation of the report, and it is absolutely counterproductive, an admission of defeat. It is the reaction of bureaucrats who want to pretend they have control of a problem, not editors who know the limitation of Wikipedia. It will indeed make the system look better. Bureaucrats and system professionals care about this, they do not really care it it fulfills the function. They just want it to look professional. (This is not meant as personal--when i was a professional in a complex system, it was most important to me that my library appeared excellent --I was very aware of the things that were grossly imperfect but that I could not affect, and I did very well at making them invisible.)
It is completely wrong that "adding more reviewers to this system will not make it work better." The most important thing to note is the direct opposite: without adding more reviewers to the system , it will never work much better. The only real way of improving anything at WP is more participation in the process. The principal goal of us all should be to increase participation at every step--starting with people reading articles being willing to make obvious improvements, all the way through every step in increased involvement, all the way to being regularly writing new articles. If retention at each of the many steps were increase even slightly, the overall effect would be significant. When we look at the details, our goal should mostly be to remove impediments. We do not need to come from outside to design a system. The entire principle underlying the Wikipedia projects is that the system is self-designing and self-correcting. Not everything can be done by such methods; Wikipedia is not the all-encompassing intellectual product of mankind, but has a special role: a general purpose encyclopedia of first resort universally available. What can be done by amateurs working with informal coordination is what we should do. What requires specialists or professionals or centralization is what should not be part of the projects, and was never from the first intended to be.
It does not take expertise to do most reviewing--neither great expertise at WP and certainly not expertise in the subject. That doesn't mean one person can do everything, but as a person does get experience, they can move to the more difficult articles. Reviewing is meant to be a first pass, and part of the problem is that its function and role has become overloaded to the point where it becomes an impediment. It should not be reviewing in detail or definitively; if it were, it would indeed be impossible to keep up. Rather, the point of reviewing is a first pass, to mark the things that must be removed, and to indicate some of the key problems. Articles are further assessed continually as people see and work on them. Each of us does have subject limitations, but again , the basic principle of a project with widespread participation, is that among us all, we will cover all the fields. What cannot be done this way, is unsuitable for us to attempt. We've seen in the development of WP, a wider and wider expansion of what volunteers are able to do and want to do. It has not been centrally developed.
Given the existing or attainable levels and types of participation, we do not primarily need improved technologies of review--I and I think most good reviewers almost never use the reviewing toolbar except for its convenient functionality in scanning. (I do use twinkle--this is an example of a combination of locally developed stopgap methods whose usefulness has been greatly expanded by widespread adoption, rather than something actually planned from the first.) What we do need is for most experienced WPedians regardless of primary interest to look at a small number of new articles each day,as part of their normal participation here. Accepting the figure of 1200 articles a day, it will be better if 200 people each review 6, than if 20 people each review 60.
There is a genuine but limited role for professionals at WP: to devise tools that volunteers seem not to want to deal with. There are two tools we need (undoubtedly others can suggest additional ones): a prescreening for likely copyvio at the time of submission, and a rough system of subject classification at input. The available technology can do these. So can AI, but we need not wait for that. Where AI might be useful is in distinguish promotional edits, where we cannot yet explicitly specify how we are judging.
Nothing about this is actually broken, in the sense of not working. Many things are not working very well, and if WP is to do what WP can do, that will always be the case. Our role is to work at the frontiers of what volunteers can accomplish. Things will always be rather rough out there. It's supposed to be that way. It's for doing this sort of unpredictable work that we need our sort of project. (I want to emphasise that I know individualy about half the people contributing to this report--based on what I know of them, any one of them could have done more realistically by themselves,and all of them do more realistically in their volunteer capacities.) DGG ( talk ) 03:37, 2 June 2017 (UTC)
The backlog is not a self-created problem, Sadads. Page Curation has been doing (more or less) alright during the 5 years of its creation until its inadequacies, (or more accurately, those of the entire NPP system) were exposed by the new trend in 2015 that has shaped the profile of the envelope of what we call the fire-hose of totally inappropriate new 'articles'.
’’So the rise of the backlog from around 6,000 to 22,000 in a year means that the current system is not working well. The only way to get the backlog under control is to change the current system. ‘’ - well, not quite: The only way to get the backlog under control is to improve the current system. And, very importantly, introduce some stricter control over the type of new articles before they get created: prevent the rubbish, while encouraging good faith users to better read the instructions and follow the guidelines when creating content that might just be acceptable.
I didn't misunderstand your meaning of 'reward' for an instant. I used the term 'reward' metaphorically - the volunteers are not looking for handouts from the WMF in return for good work. The volunteer community expects the WMF to provide the tools they need in order to uphold the values the WMF insists the volunteers maintain.
One of the problems is that it is not always easy to understand the anatomy of an organisation that is largely self-managed by a large group of unpaid volunteers. Let’s not be confused with the role of the volunteer community here. It’s comprised, very broadly of three elements:
Stick and carrot tactics are not going to get the community to do better work of building and and keeping clean the content of this encyclopedia. What we need right now is for the expert statisticians and data analysts employed by the WMF to explain to us why that simple but very worrying graph of mine has such a strange shape - to prove or disprove once and for all any correlation between its shape and any events on the ground. When we have that, we need the expert code writers employed by the WMF to write the tools the volunteers have identified as being needed now and keep asking for, rather than speculate on the positive effect AI or ORES will certainly have, but only in the more distant future.
To expect the volunteer community to write their own code as well would be asking a bit too much - en.Wiki is as big as all the other WMF encyclopedias rolled together and thus demands some extra attention. What the volunteers might well do however, is find their own solutions which the Foundation may disapprove of, but which the volunteers are fully entitled to roll out by creating their own governance structures by which they maintain quality’’.
Rescuing or engaging with new content can be a reward in and of itself, yes, but only if the page reviewers can see potential in the articles they police rather than a disheartening flood of utter rubbish that needs constant mopping up. In the older days of Wikipedia, NPP was interesting, new articles arrived and it was a pleasure to read them, cross a few Ts, and dot a few Is and approve them for inclusion. This is absolutely not happening today. The good stuff is now largely (fortunately) autopatrolled. What we are basically doing today at NPP is shovelling s*** and standing up to our waists in it. And rather than being rewarded, we’re sometimes being criticised for doing it as best as some of us can. Kudpung กุดผึ้ง ( talk) 17:45, 5 June 2017 (UTC)
I think this conversation is great, but I also find that a lot of the comments here on the talk page are not approaching one of the biggest conclusions from the WMF team's analysis (note, I work for the WMF, but in a completely different focus/part of the organization): that, at least in their theory, alot of the problem probably lies in "Time-Consuming Judgment Calls".
I am one of those patrollers who skips pages because of the "Time-Consuming Judgment Calls". I frequently will start at the back of the NPP patrol log, and about 30-50 pages in, and about 5 patrolled items later, my experienced editor brain is a bit exhausted with the backlog because:
If I could do any of the following actions with the queue within the NPP tool, my time looking at the article would be more rewarding:
More rewarded, I would be more motivated to patrol more pages. Also, I think these filtering strategies, would create very simple ways to engage more experienced editors in the backlog: either via "we have a lot of popular pages, which we aren't sure if they are any good: lets make sure that they don't put Wikipedia in a bad light" or "There is a huge backlog of Rugby biographies that need to be reviewed! You edit Rugby Biographies: come help!" or "There are a lot of Good faith people out there that need help! Help us grow the community by interacting with them!" Also, the content that would get neglected with these filters, would be low public-interest, low topical relevance, probably neutral contributions (i.e. no risk to our long term public impact).
I think what we have here is a big number, that only looks intimidating and/or bad, because we can't prioritize within that number materials/content that deserves more time to make judgement calls. Even if the backlog grows indefinitely (like many of our other backlogs do), with more filtering folks could hack at the bits that make the most sense to them (I disagree with the proposed auto-expire of the backlog-- backlog ≠ bad thing) . Right now the only reasonably good filters (time, no categories, and deletion) are only good for someone motivated by those particular Wikipedia-wide concerns with the content -- which most of our other tools, programs and activities suggest editors aren't. Sadads ( talk) 20:49, 2 June 2017 (UTC)
What if we had a bot that recorded the top reviewer for each week, and gave them a barnstar for it? This is stupid as hell, but if we want people to actually push buttons, everything tells us that these stupid pseudo "rewards" help with that kind of thing, and maybe could boost morale overall, since the only feedback most reviewers probably get is negative feedback when someone notices they've been doing it wrong. TimothyJosephWood 20:20, 3 June 2017 (UTC)
After reading the comment(s) above, I completely support Timothy's suggestion. I hope you are feeling better today. :-) —usernamekiran (talk) 11:10, 4 June 2017 (UTC)
This report is great. It provides perspective on a complex subject.
On of the pieces of information that I found very informative is the number of page views for some example articles mentioned in the report. Page view numbers help to understand how many people may be affected by potential issues in an article, and at the same time, how much demand there is for such content. I was wondering if those numbers can be used to help reviewers in some way. For example, using them to prioritise the backlogs or just surfacing them in reviewing tools to help reviewers in their assessments. This may be an area worth exploring as work is done in the area of review tools.
-- Pginer-WMF ( talk) 08:06, 5 June 2017 (UTC)
I added a section to the report with stats on reviewer participation -- from Jan 2015 to May 2017, we've got the number of unique reviewers working each month, and the number of reviewed pages. There are some noticeable changes at June 2016 and November 2016. In June, User:SwisterTwister slowed down their participation, which brought the number of reviews down. In November, with the creation of the patroller user right, the number of active reviewers dropped from around 950 per month to 350 per month. I'd be interested to know what you all think about the newly-added stats. DannyH (WMF) ( talk) 20:53, 5 June 2017 (UTC)
Thanks for the report. The key points that I think the reports tries to demonstrate are:
Other interesting points of discussion here:
Here are some of my own opinions and reaction to the report:
I don't believe Wikipedia editors are going to change their convictions about the project based on decree from WMF. In the current environment, we're not going to get consensus on inclusionist-leaning policy changes and imposing such changes will further polarize the community. New NPP reviewers are not going to come charging over the hill. ACTRIAL has demonstrated that we can get consensus on quality-focused policy. WMF is unlikely to abandon the "Anyone can edit" mission and the deletionist sentiment is not strong enough to challenge this.
It seems to me that the way forward is for everyone to get comfortable with a large and growing backlog. The WMF and inclusionists can console ourselves with the knowledge that backlogged pages lose their NOINDEX status after 90-days. ~ Kvng ( talk) 21:16, 5 June 2017 (UTC)
@ DannyH (WMF) and Kvng: Just found out though Boleyn. Its not a lot, but its a good sign. Progress is being made. Soon, we can decrease the backlog markedly. —usernamekiran (talk) 18:32, 6 June 2017 (UTC)
So... how does "the graph" look when you take into account articles created by autopatrolled users prior to their gaining the autopatrolled right? There's been a fairly big push lately to culling the herd in this respect, and fairly mass examining, sorting, and funneling those potentially eligible to PERM, at least it seems that way. I don't actually have statistics on whether there's been an overall marked increase in those having the right as measured by the number of new articles created.
What kind of difference would it make in the backlog if we automatically and retroactively reviewed unreviewed articles by editors when granting autopatrolled? I mean, some of these are by design some of the most prolific article creators on the project. Admittedly, this may effectively raise the bar at PERM, since the responsibility for checking off on potentially scores of articles falls to the button pusher. TimothyJosephWood 21:30, 5 June 2017 (UTC)
I've started a thread at Wikipedia_talk:The_future_of_NPP_and_AfC#Moving_forward to discuss how we can practically move forward with any potential reforms to the NPP process because I feel that might be a better location to discuss the broader questions than on this reports talk page. Comments from anyone who is interested in the discussion would be appreciated. TonyBallioni ( talk) 00:46, 7 June 2017 (UTC)
After reading the report and a lot of the responses, I have a modest proposal that seems to have been hinted at but not flushed out. Why not allow quicker triage - after you read the article, and don't mark it patrolled, move it into one of the sub-buckets described in the report - in a mobile and desktop friendly way.
These three categories can be called "Patrol-pending articles", versus just "unpatrolled". This has the immediate benefit of letting others know that an article has been viewed at least once, and is not an easy article to patrol.
Category 1 goes to a "copy edit required" bucket, for those who like to copy edit but aren't subject matter experts. That gets marked as something like "viewed but not edited" and is no longer part of the new article backlog. Category 2 goes to a "subject matter expert needed" bucket, with corresponding topic categorization, and the talk page is automatically created with a note that experts in that category are being sought. People could sign up for patrolling rights and specify a category they'd like automatic notification on. I am part of the feedback request service, and while it's interesting getting random articles to help with, I'd be better put to use in technology articles. Patrol approved editors can still review the categorized backlog as needed. This would improve engagement. Category 3 is the hardest bucket - those ones require the most time and effort, so the gamification efforts (barnstars, etc.) may work here - the elite of the elite.
Again, with this process, you're crafting the patrolling challenge to match people's unique skill sets and interests, with a goal of fostering engagement.
Another related issue that I haven't seen discussed is articles for deletion. The afd process is related to this, in that it is another barrier to good information appearing on the encyclopedia. I don't need to name names because many of you will already know people like this, but with the higher standards being applied, and the speed at which some people work, many good articles are being deleted. I've personally seen a bias against articles on Indian subjects, likely because of Western unfamiliarity with Indian culture coupled with the limited English of the writers. I have participated in several afd discussions started by the same people and successfully voted to keep the articles, but can only wonder how many good articles have been lost. OK - so here's another not-so-modest modest proposal - I lied. I recommend that we track nominations for deletion, and the success rates the nominators have. If an article is voted keep or no consensus (against the nominee's wishes), that's scored a -1, while a successful nomination (deletion consensus) is a 1. Once someone has a negative total score (50% of their nominations are successfully blocked) they would be capped as to how many articles they could nominate, per some period. I don't have system statistics but I'm sure the sysadmins can figure out what would work. This would minimize disruption, and preserve the goal of keeping good info on the site. Having to be right 1/2 of the time seems fair - otherwise count all the time volunteers spend protecting info from unnecessary challenges as time which could be better spend patrolling new articles, and solving the first problem. Timtempleton ( talk) 17:26, 7 June 2017 (UTC)
We need to rid ourselves of the delusion that writing a new article from scratch is easy, and we need to stop telling that lie to newbies. The truth is that it's hard, even veteran Wikipedians don't simply whack out a perfectly formed complete article in one sitting. From my not inconsiderable experience at AFC I've come to the conclusion that the vast majority of wannabe beginner editors are in fact barely functionally literate, even those whose mother tongue is allegedly English.
I suspect that Visual Editor is to blame for the origin of the massive growth in various backlogs. Before VE the requirement to learn at least the elementary basics of wiki-markup, and the consequent necessity to read and understand various guidelines before one could actually write anything, acted as a barrier to entry that fairly effectively kept out the incompetent. Then VE came along and made it dead easy for anyone to post any crap they felt like without first having to learn anything about Wikipedia. VE is malware, plain and simple. Roger (Dodger67) ( talk) 19:03, 7 June 2017 (UTC)
Having read the above discussion, I have a proposal.
Power~enwiki ( talk) 19:24, 7 June 2017 (UTC)
Something I'd like to bring to everyone's attention is how difficult it is to reduce the backlog as a result of a number of design decisions by the developers of the page curation tool.
The analysis appears to show that reviewers leave the hard cases for last, where they require time-consuming judgment calls. That time would be better spent working on those areas where there is a higher ratio of pages that should be speedily deleted, PROD'ed or sent to AfD. I did not really see anything in the analysis that showed me what reviewers are actually spending their time on. So, as a reviewer, I'd like to provide some insight into how I use my time and how my time can be used more productively by making some changes to the tools and processes that I use.
I can only work at the beginning or the end of a log, and I try to do both; the beginning to catch the really blatant cases that need to be deleted, and at the back of the log because it's interesting and challenging and is a last look to see if we are going to let something sneak by, often those are contributions by paid editors or editors with a COI. The problem is that the back of backlog is so massive that it is impossible to reach. If I look at page curation now, the oldest pages are from 2007, old redirects that haven't spent all that time in the queue. The cases that have been sitting there all the time are from 29 December, 163 days ago. To access a page, using the page curation tool, that is actually at risk of becoming indexed, rewarding the spammers, I would need to scroll to May 12 or thereabout and it would take me about half an hour to do so, if I didn't know to go to Special:NewPages and set the offset manually to 20170512000000 where I can find those pages that are about to get indexed. Pages are only listed at NewPages for 30 days though, so to get at any of the unreviewed pages between 30 and 160 days old, you would have to resort to other tricks.
I can't say what other reviewers have figured out, but I am, for all intents and purposes completely unable to work on the "middle" of the backlog. In my experience, the pages that are 150 days old are no more difficult to review than pages that are 30 days old. They have not been skipped endlessly. They have only been skipped for the first 30 days. After that, they cannot be accessed, and so they are just sitting there until they reappear at the end of the backlog.
Another problem that I face are the varying notability standards that the different projects have developed. I'm pretty familiar with art-related stuff, but know nothing about sports. So I much prefer to review articles that cover a subject that I care about, and where I have access to sources as well as a good sens of which sources are reliable. So I do skip a lot of wrestling articles because I can't tell all the different wresting organizations apart. A lot of my time goes to skipping stuff I don't care about. To work around that problem, I look at suggestions from InceptionBot, which gives me project-related topics that I do care about. I work faster, I'm happier, and I contribute more reviews when I can work in areas where I have expertise.
Another time-sink for me is performing manual tasks that could be automated. Checking what links here to see if something is an orphan when the page curation tool already shows me that an article is an orphan seems unnecessary. I should not have to tag an article as missing references when the page curation tool already has that information. It would be nice to preload the uncategorised, orphan, unreferenced tags if an article is already listed in the New pages feed with No categories, Orphan and No citations.
Checking recreations of deleted articles when dealing with spammers is another manual task that is taking a lot of time. Maybe there is a better way to do this, but bringing up the deletion log is time-consuming. Automated flagging of recreations would be helpful. For checking copyvios, I have added stuff to my toolbar that loads Earwig's tool with the page I'm reviewing to make that a bit faster, but it would be nice if that too had been done already.
In summary: The page curation tool would already serve my needs (and possibly others) much better if only two improvements could be made: a filter for pages that I have already skipped, and a configurable offset date.
Thanks for listening, and thanks for all your hard work Mduvekot ( talk) 23:11, 10 June 2017 (UTC)
I agree with and appreciate Mduvekot's post. Like DGG, I would really like a keyword search, I could be much more effective. A third opton between oldest and newest would be great too. I'd rather prioritise those not brand new and still being worked on and not yet indexed. Boleyn ( talk) 05:25, 12 June 2017 (UTC)
If (article age=30d & number of extended confirmed editors >=3 & number of references [including non-social media external links] >=2 & (article not an obvious copyvio | G4 candidate) & article has at least one tag) then mark as reviewed
What we are doing (or should be doing), Mduvekot, ever since we got the Draft namespace created (and that's why we wanted it) is:
and we've been having the conversation about the 'move to draft' tool since first suggested here 13 December 2015 by czar and listed at Phab by Czar on Jan 22 2016,, but despite being requested several times, and listed here 9 months ago, the devs are not doing anything about it. Kudpung กุดผึ้ง ( talk) 15:45, 13 June 2017 (UTC)
I saw this in the Economist and thought it might be interesting to Wikipedia leadership. It's a discussion of how to encourage peer review, tangentially related to clearing the NPP backlog.[ [2]]. Timtempleton ( talk) 14:41, 12 June 2017 (UTC)
I did some digging around after coming across this question on Quora. I came to a conclusion that only about 16.7% articles on Wikipedia have been marked as reviewed. Please see the calculations here. For those too lazy, there are about 5,422,965 [1] articles on Wikipedia and only about 905695 [2] have been marked as reviewed. Don't know if it helps but thought it wise to share it here as well. Yashovardhan ( talk) 19:14, 12 June 2017 (UTC)
References
{{
cite web}}
: |author=
has generic name (
help)
1. In the section 'Non-autoconfirmed contributors', the author writes "According to our calculations, there's an average of 1,180 new mainspace articles created on English Wikipedia every day,...",. The graph shows page review backlog, and for only six months. The growth of the backlog is merely the symptom of potentially many other interrelated factors. Would the WMF please provide graphs/query data that include the past 18 months of data, showing the following?
This would give us useful insight to new article creation trends, reviewing trends, and article quality trends. It will be far more useful than showing the overall backlog. Thank you for your help.- Mr X 12:59, 15 June 2017 (UTC)
2. In the same section, a conclusion is made:
"While removing pages created by non-autoconfirmed users would reduce the burden on that first wave of reviewers, it would result in the loss of many potential good articles. It would also send a clear message to new Wikipedia editors that their contributions aren't wanted, potentially stunting the growth of the editing community. Most importantly, it wouldn't actually solve the problem of the growing New Page Review backlog."
This seems to be built upon several false or dubious assumptions, specifically
"Pages created by non-autoconfirmed user would be removed"- No, users would be greeted with a message telling them there are several options for creating an article. For example, that they can create a freeform draft, which they can ask an editor to review. If it is ready, the editor will publish it for them.
"It would result in the loss of many potential good articles"- No, 80% of new article by new contributors are junk. The other 20% would not be lost at all. Evidence is required that there would be a loss of many potential good articles, as that is a rather extraordinary claim.
"It would also send a clear message to new Wikipedia editors that their contributions aren't wanted"- No, it would send a clear message to new contributors creating the 80% junk articles that we have standards, and to the new contributors creating the 20% usable article that their contributions are welcome but, that they must follow one of the processes before their article can be published. Isn't this exactly what we do for image uploading?
"It wouldn't actually solve the problem of the growing New Page Review backlog"- Of course it would. It would free page patrollers to review the rest of the backlog and it would disuade spammers and vandals from co-opting the encyclopedia for their non-encyclopedic purposes.
- Mr X 22:49, 15 June 2017 (UTC)
For those who have not yet seen it, we have some numbers based on deletions from the WMF. I've produced the charts based on them. This is based on taking the articles created the first week of November 2016 and checking their status as of 14 June 2017. You can see the data at User:MusikAnimal (WMF)/NPP analysis. WMF is working on getting us more numbers on this, but these also give us a snapshot of what has happened to articles that have all been reviewed. TonyBallioni ( talk) 13:14, 15 June 2017 (UTC)
Hi all,
as planned earlier and also suggested above by Innisfree987 and others, we have produced a different version of the backlog snapshot graph. To recap the explanation posted earlier, the version of the graph used in the report was showing the number of still unreviewed articles per day as it appeared to new page patrollers at Special:NewPagesFeed at one point in time (May 25), using the same definition of "new editors" as patrollers see there (i.e. if you change settings there to only show unreviewed articles by "new editors", you would get a list that exactly corresponds to the blue part of the graph). The new graph instead uses the autoconfirmed status at the time of the article's creation, because that's more pertinent to the question of how much disabling article creation for non-autoconfirmed users might reduce the backlog:
It confirms the earlier observation that pages by non-autoconfirmed users (by either definition) make up only a small portion of the backlog - to be exact, just 15% of the still unreviewed pages came from users who had not been autoconfirmed when they created the article.
Regards, Tbayer (WMF) ( talk) 22:04, 15 June 2017 (UTC)
Hello! After a few days of coding I think I have a months worth of rich data that I hope will answer some questions. There are a LOT of charts I want to share, so instead of further clogging up this page I'm going to link you to User:MusikAnimal (WMF)/NPP analysis. The data you see there right now covers February 15 to March 15 2017, which should (seemingly) line up with the abrupt upward trend on Kudpung's NPP backlog chart.
No matter what you make of the data, I've got some exciting things to tell you. Thanks to the incredible work of Yurik with mw:Extension:Graph, the charts you see will automatically update as new data comes in. They can all be shown using the {{ User:MusikAnimal (WMF)/NPPChart}} template (which I will document fully). You can use this template anywhere and the charts will automatically update. Eventually I hope to have many months, even years, of data. When we get to that point the charts will magically become interactive and you can zoom in/out, etc. For now they are static because I wanted to get the numbers to you quickly, so I didn't put too much effort into it :) I also realize the pie charts don't actually show any numbers, something I hope to fix tomorrow, and I'll also throw in percentages.
One of the things people have been complaining about is how hard it is to get this data. Well, since these charts automatically update when new data is supplied, we just need to automate generating that data. I have the script to do it, I just need bot approval and some rough community consensus. Let me know if you think this is a good idea. My thoughts are to first backfill as much data as we can, then have the bot automatically add data in everyday. That way you always have rich and up-to-date numbers at your disposal. I'd also like to move {{ User:MusikAnimal (WMF)/NPPChart}} to Template:NPPChart but I first want to see what you all think.
Now, the chart you all probably want to see the most:
Graphs are unavailable due to technical issues. There is more info on Phabricator and on MediaWiki.org. |
...which is produced by {{
User:MusikAnimal (WMF)/NPPChart|type=line|usergroup=non-autoconfirmed|metric=survival}}
Maybe this is suggestive that ACTRIAL has some merit. For the record, I'm indifferent and am more committed to just bringing you the data. The only thing I'll be sad about is half of my charts will no longer be relevant because there will be no new pages created by non-autoconfirmed users – but that's not a real argument but rather your common engineer frustration ;) We can still make use of the other charts.
One thing I did want to point out, however, is that while we have data on the types of deletions (speedy vs PROD vs AfD), we don't have data on the types of speedy deletions. I'd like to do that next, and will share a chart for it once I have the data. What I'm getting at is probably much of the speedy deletions from new users are vandalism-related. I'm not outright dismissing ACTRIAL, but I think it's worth noting vandals will be vandals, and if they can't do it through new pages they'll likely target existing ones. This is opposed to new users who don't understand inclusion criteria. The ratio of those versus vandals is something I'd personally like to see.
Let me know what you think of the charts, and while discussion ensues I'll keep my script running to backfill more data, and they will all magically update. Again let me formally ask for feedback about the idea of a bot. I'd love to make that happen so we don't have to go through this long process again. With the bot, you won't need to request the data, it will just be there, right here on the wiki :)
Warm regards, MusikAnimal (WMF) ( talk) 01:54, 16 June 2017 (UTC)
Graphs are unavailable due to technical issues. There is more info on Phabricator and on MediaWiki.org. |
Extended content
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@ DannyH (WMF) and Joe Roe: I am not particularly convinced that the process has changed all that much for those of us who do it: I needed the reminder of the backlog to come back to participate not a change in the documentation to return -- its more of a social engagement and organization problem (there are tons of admins who also, by default, have the right: but don't use it). I think there are other things that have been identified by the conversations here:
I hope that analysis of what I am hearing helps. Also, I want to note like I did higher on the page: I work for WMF, but have not engaged with or supported this team working on this project during this project as part of my work -- my scope and focus is on WP:GLAM. Sadads ( talk) 21:01, 16 June 2017 (UTC)
Month | Backlog on day 1 | Articles created | Articles reviewed | Backlog on last day | Articles deleted | Speedy | ||
---|---|---|---|---|---|---|---|---|
March | Number | |||||||
April | Number | |||||||
May | Number |
Myself, I have auto-patrolled status and so am able to create a steady stream of new articles without much interference but new editors have much more difficulty. Here are three cases which I have noticed recently:
1. Ashley Hannah was one of a large group of students who attended an outreach event at Imperial College recently. She was bright and enthusiastic and wanted advice about how to add an image to the subject that she was working upon – Sue Gibson (chemist). I showed her a way of doing this but it wasn't working. The feedback from the interface wasn't clear but my impression is that this was because she wasn't autoconfirmed and so file uploading was blocked. I could have asked her to make 10 edits but, to be autoconfirmed, four days have to elapse and she had just created her account at this one-day event. So, I wasn't able to resolve her difficulty and this was quite frustrating. She got some acclaim for her first efforts but will have been left with the impression that adding an image to Wikipedia is significantly more difficult than in Instagram, Snapchat or Twitter (the most popular apps for teens). Why do we have a four-day cooling-off period when this is so clearly an obstacle for outreach events?
2. Henrietta999 is an experienced author who spoke at another recent editathon. Her first article on Wikipedia was Margaret, Lady Moir and that went reasonably smoothly but that was perhaps too smooth as I have the impression that she'd have liked more feedback, especially some appreciation for the effort. Her second article, Margaret Dorothea Rowbotham, has not been going so well because it was found by a new page patroller who has been nit-picking and tag-bombing. This has not been well-received and the editor has withdrawn from the topic. Rather than collaboration, we have conflict – a familiar difficulty throughout Wikipedia.
3. Carolineneil has been creating a stream of impressively erudite chemistry articles from Asymmetric addition of dialkylzinc compounds to aldehydes to Use of pi,pi, CH-pi and pi-cation interactions in supramolecular assembly. These have been going through AfC and the results seem to have been reasonably productive. But the editor is quite introverted here and is getting some stick at ANI for not communicating and there's currently a proposal to block them to attract their attention. Are they unfamiliar with our ways of communicating, indifferent to them or repelled by them? Should we be looking a gift horse in the mouth?
These examples seem to confirm what was said in Encyclopedia Frown
“The encyclopedia that anyone can edit” is at risk of becoming, in computer scientist Aaron Halfaker’s words, “the encyclopedia that anyone who understands the norms, socializes him or herself, dodges the impersonal wall of semiautomated rejection and still wants to voluntarily contribute his or her time and energy can edit.”
I support the WMF's recommendation that we should try to keep the project open and accessible. Andrew D. ( talk) 10:19, 18 June 2017 (UTC)
Here's some more recent examples. They start with a case that I just came across while patrolling. The article was children in emergencies and conflicts and was at risk of imminent deletion because it had been proposed for deletion more than seven days before. I did not hesitate to remove the prod because this is quite a notable topic – it's easy to find entire books about it – and its treatment in this case was quite scholarly and well-supported by citations. After removing the prod, I looked to see who had placed it. This was not a vandal or inexperienced editor, but DGG – a veteran patroller and illustrious member of Arbcom. Now, my view is that DGG's action was quite contrary to the WP:PROD process which states clearly that it is for "for uncontroversial deletion. ... PROD must only be used if no opposition to the deletion is expected". It seemed quite inappropriate in this case because DGG said himself, "I don't know if it is fixable." It seems remarkable that such an experienced editor could act like this and so we seem to have the problem of quis custodiet ipsos custodes?. I caught this one but how many other respectable, substantial articles are being casually deleted in this way?
The creator of this article, A.mart82, is working with a Wikimedian in Residence on such topics in partnership with UNESCO. They are a new editor and notice that they created other substantial articles such as child development in Africa just one day after starting editing and so before they were autoconfirmed. Notice also that their talk page just contains templated warnings, including other deletion proposals. Isn't this a remarkably bitey way to treat someone who is working with one of the foremost cultural organisations on the planet? My view is that the WMF should encourage and support such work. If the NPP is unable to treat such work with consideration and respect, then the WMF should perhaps limit their powers rather than augmenting them?
Of course, I might be missing some aspect of this and so I encourage the parties involved to explain themselves and their actions. This will then fully inform our discussion. Andrew D. ( talk) 22:55, 24 June 2017 (UTC)
There are many controversial claims in this study that are not backed up by any evidence:
Secondly, no serious attempt is made to judge the quality of articles created by non-autoconfirmed users. Per MusikAnimal's subsequent analysis, we see that a full 3/4 of articles created by non-autoconfirmed users are eventually deleted. Why are we fighting so hard to keep allowing new users to create this crap and publish it in mainspace? Restricting non-autoconfirmed users from creating articles would not only significantly reduce the number of articles that need to be reviewed, but it would also vastly increase the average quality of articles that reviewers are reviewing. This would change the whole process of reviewing. Instead of it being the current war against the relentless diarrhea emanating from the depths of humanity, it could become a more constructive process with an emphasis on improving articles and educating serious editors.
Thirdly, remember that ACTRIAL was designed to be a temporary trial. There is only one way to definitely determine the effect of ACTRIAL on the backlog, article quality, and editor retention: implement the trial for a temporary period, and see what happens. We can do more studies and argue about it until the cows come home, but we won't come any closer to understanding the true effect unless we actually do the trial. Even if ACTRIAL turns out to be a miserable failure, it would only be a failure for a short time, and the damage that it could do to the project would be minimal. ‑Scottywong | babble _ 06:56, 6 July 2017 (UTC)
Essays Low‑impact | ||||||||||
|
Thank you to the folks who put this together. I quickly scanned the analysis, and while I don't agree with a few of its assumptions, it does shine a light on the need to find a balance between thoroughness and speed. I will probably have more comments or questions after I read it more carefully.- Mr X 23:43, 31 May 2017 (UTC)
Thanks so much for this thorough consideration. I look forward to thinking through and discussing together. If I may ask one informational point to begin (forgive me if this should be obvious). As to the key in the graph, saying blue indicates "users who are still new (not autoconfirmed) today". Does blue then indicate only from pages from editors who were still not autoconfirmed when you generated this graph on May 25, or all pages from editors not autoconfirmed at the time of their entry's creation? I.e. if four months ago, a not-yet-autoconfirmed editor made a page; it's still in the backlog; but that person subsequently made ten edits to WP, becoming autoconfirmed: is their backlogged entry marked green or blue in the graph? I ask because this would, of course, change how much the graph tells us about what percentage of the backlog would be affected by requiring autoconfirm at the time the editor creates the page.
Thanks again for the sustained attention to this important challenge. Innisfree987 ( talk) 23:46, 31 May 2017 (UTC)
First, let me say thank you for taking the time to write this and provide us with some numbers. From my first glance through it appears to me that the report is missing what is one of the more important numbers in the discussion: the number of pages created within the last 90 days that have been deleted, and then breaking that out by pages created by autoconfirmed and pages created by non-autoconfirmed. I'm still digesting the report and intend to read it again before making any substantial comments, but I do think that these numbers are critical to the conversation moving forward. TonyBallioni ( talk) 23:53, 31 May 2017 (UTC)
Thanks for this analysis. Your data runs counter to what I assumed the problem has been. I still support WP:ACTRIAL but I also think most people shouldn't be editing Wikipedia, either. I don't think the best solution to "Time-Consuming Judgment Calls" is to let them age-off the list. I understand the logic presented but NPP is our sorting mechanism for problematic content. To my mind it would make more sense to have the "Time-Consuming Judgment Call" articles nominated by a bot (like WP:G13) for deletion after 60 days so that WP:DELSORT can assemble the subject-specific experts to make a determination. Orphaned, dead-end content (if not addressed by NPP) could remain out of sight for years until finally addressed. To that end, I think it advisable to re-write the lede to not include your conclusion, as reading that up-front almost blew all of my buy-in and I was about to write another dismissive denouncement of all of you in San Francisco. I'm glad I read the piece to the end and your presentation makes some difference as to how we proceed, but I could barely stomach that conclusion. Please bury it if you're going to provide it. Chris Troutman ( talk) 00:41, 1 June 2017 (UTC)
I suspect that number is not insignificant, and it doesn't even have to be a majority to be harmful to the encyclopedia if they fall off to not be touched until someone realizes that it was a cleverly designed attack page or a copyright violation years later. The deletion numbers are critical to this conversation because they shed light on what the community has judged not to be acceptable from the recently created pages. TonyBallioni ( talk) 02:33, 1 June 2017 (UTC)
I guess my question, statistically, is whether there is some "critical mass" of views by reviewers that means the the page should probably be "semi-auto-reviewed" because enough people have passed on it, that it is likely to not be nominated for deletion, likely to survive that discussion, and/or likely to survive long enough for someone to find it and improve it? If that is identifiable, it seems preferable to an arbitrary limit like 30 or 60 days. TimothyJosephWood 02:17, 1 June 2017 (UTC)
this user passed on this article, we are likely to have that read in the minds of some,
this user was too damned lazy to do anything about it, rather than
this user did some level of evaluation, and if the article was obviously toxic probably would have taken some action, and at some level, enough "meh" probably amounts to the article not being an "important" part of the backlog, and instead just something that can just as well sit around for a few months before someone starts improving it.
The important thing to note is that adding more reviewers to this system will not make it work better.
While removing pages created by non-autoconfirmed users would reduce the burden on that first wave of reviewers, it would result in the loss of many potential good articles. It would also send a clear message to new Wikipedia editors that their contributions aren't wanted, potentially stunting the growth of the editing community.
The top of the New Pages Feed says, in bold letters, Rather than speed, quality and depth of patrolling and the use of correct CSD criteria are essential to good reviewing.
A reviewer who doesn't spend enough quality time on a given review risks being blocked from reviewing...
The urgent priorities are to:
While I am pleased that the Foundation has finally decided to at least do some preliminary review of what is wrong with our page patrolling system, its taken a very long time, and I am sure that this progress has been achieved partly, though not entirely, through my constant whinging, lobbying, Skype conferences, and personal meetings with the WMF for a year with staff such as Danny Horn, Ryan Kaldari, MusikAnimal, Jonathan Morgan, Aaron Halfaker, Nick Wilson, and recently with Wes Moran (whose name has recently disappeared from the staff list). I would particularly like to thank Kaldari and MusikAnimal who have demonstrated the greatest understanding of the critical situation and have helped where they could within the limitations of their employment while dividing their time with their volunteer activities. Kudpung กุดผึ้ง ( talk) 04:31, 1 June 2017 (UTC)
I'm very appreciative of the WMF for paying attention to this and engaging with us, but the main suggestion in this proposal (the fall-off) is a very bad one in my opinion. The data also needs a second look over because it seems to not be easily explainable and is missing some key numbers. I hope these observations add some value to the conversation. TonyBallioni ( talk) 16:36, 1 June 2017 (UTC)
In terms of the 7% figure, it makes sense, and I appreciated both your and Kaldari's explanations. I think my confusion is that by being presented alongside the backlog numbers it seemed as if that the report was saying that was the proportion of the backlog that would be reduced daily, which it is not. I think the daily backlog percentage is the more important number and would like to see that featured in the report if we can get it.Thank you for responding here. It is very much appreciated. TonyBallioni ( talk) 05:25, 2 June 2017 (UTC)
@ Kudpung: As mentioned, I am getting the impression that your criticism of the backlog graph is based on a misunderstanding about its time axis. The horizontal direction maps the creation date of the unreviewed pages, counting them at a fixed backlog snapshot time (May 25). In contrast, the graph you posted above (and others that track how the size of the backlog has been developing) varies the backlog snapshot time and does not distinguish unreviewed pages by creation date.
Thus, it doesn't make sense to demand that the chart "needs to go back to mid 2016 when the the backlog suddenly began to increase at an alarming rate". In fact, as can be seen from the raw data in the PAWS notebook (see the "queryresult" or "df_pivot" cells), or simply by selecting "unreviewed" and "sort by: oldest" in Special:NewPagesFeed, there are almost no pages in the current backlog dating from before December 2016, which was the pragmatic reason for starting the chart there.
Another way to look at this time axis distinction is that my backlog graph takes a specific value in your graph (for May 25, near its right end) and splits up that total backlog size by article age to get a more detailed understanding of what kind of pages made up the backlog at that point in time. One could even include both dimensions and make a three-dimensional plot: number of unreviewed pages (z-axis) per creation date (x-axis) and backlog snapshot date (y-axis).
Regards, Tbayer (WMF) ( talk) 07:14, 2 June 2017 (UTC)
Wikipedia:New_pages_patrol/Analysis_and_proposal#Non-autoconfirmed_contributors appears to say that very few pages are created by non autoconfirmed users, but then it appears that autoconfirmed status is as measured now, not at the time of page creation. A quick look at the most recent creation tells me that non autoconfirmed new page creations are a lot higher than 7%. Autoconfirmation is a very low bar, and will be very easily met with a little fiddling of the first page. Just wondering about the facts. -- SmokeyJoe ( talk) 05:44, 1 June 2017 (UTC)
Some parts of the the report are vague. At some instances, it is not clear if the report supports the reviewers/patrollers, or if it wants the right to be eliminated. Anyways. I mostly disagree with the section "This system is not sustainable". It says reviewers are generalists, this doesnt mean they dont have an area of expertise at all. If the reviewers are tripled in number, there are very high chances that at least one reviewer would be familiar with that particular field/category/subject with the backlogged article. Or there would be at least one editor from these 1000-1200 reviewers who would say "doesnt matter how long it would take, i will work on it". And in certain cases there is {{ expert needed}} tag. I am pretty sure more than 95% reviewers know about these template as reviewer user-right is not granted easily.
"The only sustainable way to manage the backlog is to reinstitute the expiration date, which the system had from 2007 to 2012. An article that survives the gauntlet of reviewers for a reasonable amount of time – say, 30 days or 60 days – is unlikely to be picked up and fixed by a generalist new page reviewer. Pages that survive past that deadline should be improved by subject matter experts, which is the way that Wikipedia works. With a 30 day expiration, the backlog on May 30th, 2017 would have 5,650 pages instead of 21,800. With 60 day expiration, it would have 10,200 pages."
I think it is prohibited for subject matter experts to touch the page if it is unreviewed. Maybe they get blocked for 24 hours for just clicking the "edit" on an unreviewed page. Why not set the expiration date at 10 days? It will be really good for everybody. This will be exactly like, changing the definition/upper limit of high blood pressure to be able to say "no! The parient doesnt have a high blood pressure." —usernamekiran (talk) 13:50, June 1, 2017 (UTC)
There seems to be a popular misconception that ACTRIAL was only about preventing non confirmed users from creating pages. That perception is completely wrong and discredits those who worked hard to create the ACTRIAL project and the hundreds of users who voted the overwhelming consensus for it. In 2006 by withdrawing the 'right' of IPs to create pages, the Foundation already acknowledged that Wikipedia is organic and that the rules occasionally need to be modified accordingly.
It is erroneous to allow the impression that the current backlog began concomitant with the creation of the New Page Reviewer user right in November. It didn't. This current backlog actually stretches back to mid 2016 (where it was 'only' 5,000). This was a already a grave concern and is what gave rise to the talks in Italy and the run up to the creation of the New Page Reviewer group in November. In fact for a while, until it suddenly started rising dramatically again in February, after the roll out the backlog actually began slowly but surely to diminish.
The Foundation has now given us a page full of comment, which is genuinely very welcome and highly encouraging and I'm sure that those commenting here will read it entirely and carefully, but in order to get properly up to speed, the WMF team should probably also be encouraged to do the community team the courtesy of reading the whole of this page: Wikipedia talk:The future of NPP and AfC. Kudpung กุดผึ้ง ( talk) 15:10, 1 June 2017 (UTC)
There is a lot of information here and I am not following this proposal. Suppose that someone makes a submission and it gets no review after 30 days. After that point, will it move into Wikipedia mainspace in the same way as an article which passed review? Blue Rasberry (talk) 20:08, 1 June 2017 (UTC)
@ Kaldari: nope, it would be a disastrous move. Plenty of good articles would get deleted just because a group of limited users couldnt process it. And bots are dumb after-all. Isnt that right, DumbBOT? (Just nod your head if you want more electricity.)
It is easy to foresee this solution will never get the consensus. —usernamekiran (talk) 22:08, 1 June 2017 (UTC)
It is easy to foresee this solution will never get the consensus.That's probably the one thing we can all agree on. TimothyJosephWood 23:27, 1 June 2017 (UTC)
The WMF's solution is really a head-scratcher to me. They think the problem is "The queue is too big" so the solution is "Just throw stuff out of the queue after 30 days". If the only metric we're going by is length of queue, I have an even better solution. Get rid of the queue entirely and the queue length will be zero! The secondary metric (quality of content) is lacking in this report. Further, they fundamentally misunderstand the guidance given to patrollers. They wrote "Following the Article namespace checklist – the minimum effort that a reviewer is supposed to do – this article would probably take days to fix. You'd have to track down references, most of them not in English, and completely rewrite the page from scratch." That's incorrect. The guidelines for patrollers say to fix easy issues and tag more complicated things, which doesn't take long at all. Same goes for notability. If notability is seriously questionable, tag with questionable notability (or bring it to AfD for more opinions, which is also fine). Detecting issues and fixing issues are very different things with very different time commitments, and so the analysis is flawed. If we have a messaging problem where patrollers think we're asking them to make every article GA-quality, let's talk about that. But if every reviewer followed the guidance given to them, there's no reason to believe articles in need of improvement are driving the backlog. ~ Rob13 Talk 00:06, 2 June 2017 (UTC)
if every reviewer followed the guidance given to themBut they're not. So we can talk about a perfect world all day, but how do we square the de jure with the de facto in a way that in a reasonable logistical sense accomplishes the mission? TimothyJosephWood 00:15, 2 June 2017 (UTC)
if only every reviewer would review X amount of articles the system would workso many times it's starting to make me nauseous. They're not...we're not and we have to ignore the idea that it's a problem with the reviewers because that's not a problem we can solve. So what problem can we solve? TimothyJosephWood 00:20, 2 June 2017 (UTC)
It is completely wrong that "The only sustainable way to manage the backlog is to reinstitute the expiration date" The effect of a cut off date will be to keep articles from ever being reviewed. There is no reason why anything at all should totally drop off the end--if there is any purpose to reviewing, everything needs review. All that having an artificial cut off does it prevents us from realizing the extent of the backlog, by deceiving us into not seeing it. Having such a date is the basic recommendation of the report, and it is absolutely counterproductive, an admission of defeat. It is the reaction of bureaucrats who want to pretend they have control of a problem, not editors who know the limitation of Wikipedia. It will indeed make the system look better. Bureaucrats and system professionals care about this, they do not really care it it fulfills the function. They just want it to look professional. (This is not meant as personal--when i was a professional in a complex system, it was most important to me that my library appeared excellent --I was very aware of the things that were grossly imperfect but that I could not affect, and I did very well at making them invisible.)
It is completely wrong that "adding more reviewers to this system will not make it work better." The most important thing to note is the direct opposite: without adding more reviewers to the system , it will never work much better. The only real way of improving anything at WP is more participation in the process. The principal goal of us all should be to increase participation at every step--starting with people reading articles being willing to make obvious improvements, all the way through every step in increased involvement, all the way to being regularly writing new articles. If retention at each of the many steps were increase even slightly, the overall effect would be significant. When we look at the details, our goal should mostly be to remove impediments. We do not need to come from outside to design a system. The entire principle underlying the Wikipedia projects is that the system is self-designing and self-correcting. Not everything can be done by such methods; Wikipedia is not the all-encompassing intellectual product of mankind, but has a special role: a general purpose encyclopedia of first resort universally available. What can be done by amateurs working with informal coordination is what we should do. What requires specialists or professionals or centralization is what should not be part of the projects, and was never from the first intended to be.
It does not take expertise to do most reviewing--neither great expertise at WP and certainly not expertise in the subject. That doesn't mean one person can do everything, but as a person does get experience, they can move to the more difficult articles. Reviewing is meant to be a first pass, and part of the problem is that its function and role has become overloaded to the point where it becomes an impediment. It should not be reviewing in detail or definitively; if it were, it would indeed be impossible to keep up. Rather, the point of reviewing is a first pass, to mark the things that must be removed, and to indicate some of the key problems. Articles are further assessed continually as people see and work on them. Each of us does have subject limitations, but again , the basic principle of a project with widespread participation, is that among us all, we will cover all the fields. What cannot be done this way, is unsuitable for us to attempt. We've seen in the development of WP, a wider and wider expansion of what volunteers are able to do and want to do. It has not been centrally developed.
Given the existing or attainable levels and types of participation, we do not primarily need improved technologies of review--I and I think most good reviewers almost never use the reviewing toolbar except for its convenient functionality in scanning. (I do use twinkle--this is an example of a combination of locally developed stopgap methods whose usefulness has been greatly expanded by widespread adoption, rather than something actually planned from the first.) What we do need is for most experienced WPedians regardless of primary interest to look at a small number of new articles each day,as part of their normal participation here. Accepting the figure of 1200 articles a day, it will be better if 200 people each review 6, than if 20 people each review 60.
There is a genuine but limited role for professionals at WP: to devise tools that volunteers seem not to want to deal with. There are two tools we need (undoubtedly others can suggest additional ones): a prescreening for likely copyvio at the time of submission, and a rough system of subject classification at input. The available technology can do these. So can AI, but we need not wait for that. Where AI might be useful is in distinguish promotional edits, where we cannot yet explicitly specify how we are judging.
Nothing about this is actually broken, in the sense of not working. Many things are not working very well, and if WP is to do what WP can do, that will always be the case. Our role is to work at the frontiers of what volunteers can accomplish. Things will always be rather rough out there. It's supposed to be that way. It's for doing this sort of unpredictable work that we need our sort of project. (I want to emphasise that I know individualy about half the people contributing to this report--based on what I know of them, any one of them could have done more realistically by themselves,and all of them do more realistically in their volunteer capacities.) DGG ( talk ) 03:37, 2 June 2017 (UTC)
The backlog is not a self-created problem, Sadads. Page Curation has been doing (more or less) alright during the 5 years of its creation until its inadequacies, (or more accurately, those of the entire NPP system) were exposed by the new trend in 2015 that has shaped the profile of the envelope of what we call the fire-hose of totally inappropriate new 'articles'.
’’So the rise of the backlog from around 6,000 to 22,000 in a year means that the current system is not working well. The only way to get the backlog under control is to change the current system. ‘’ - well, not quite: The only way to get the backlog under control is to improve the current system. And, very importantly, introduce some stricter control over the type of new articles before they get created: prevent the rubbish, while encouraging good faith users to better read the instructions and follow the guidelines when creating content that might just be acceptable.
I didn't misunderstand your meaning of 'reward' for an instant. I used the term 'reward' metaphorically - the volunteers are not looking for handouts from the WMF in return for good work. The volunteer community expects the WMF to provide the tools they need in order to uphold the values the WMF insists the volunteers maintain.
One of the problems is that it is not always easy to understand the anatomy of an organisation that is largely self-managed by a large group of unpaid volunteers. Let’s not be confused with the role of the volunteer community here. It’s comprised, very broadly of three elements:
Stick and carrot tactics are not going to get the community to do better work of building and and keeping clean the content of this encyclopedia. What we need right now is for the expert statisticians and data analysts employed by the WMF to explain to us why that simple but very worrying graph of mine has such a strange shape - to prove or disprove once and for all any correlation between its shape and any events on the ground. When we have that, we need the expert code writers employed by the WMF to write the tools the volunteers have identified as being needed now and keep asking for, rather than speculate on the positive effect AI or ORES will certainly have, but only in the more distant future.
To expect the volunteer community to write their own code as well would be asking a bit too much - en.Wiki is as big as all the other WMF encyclopedias rolled together and thus demands some extra attention. What the volunteers might well do however, is find their own solutions which the Foundation may disapprove of, but which the volunteers are fully entitled to roll out by creating their own governance structures by which they maintain quality’’.
Rescuing or engaging with new content can be a reward in and of itself, yes, but only if the page reviewers can see potential in the articles they police rather than a disheartening flood of utter rubbish that needs constant mopping up. In the older days of Wikipedia, NPP was interesting, new articles arrived and it was a pleasure to read them, cross a few Ts, and dot a few Is and approve them for inclusion. This is absolutely not happening today. The good stuff is now largely (fortunately) autopatrolled. What we are basically doing today at NPP is shovelling s*** and standing up to our waists in it. And rather than being rewarded, we’re sometimes being criticised for doing it as best as some of us can. Kudpung กุดผึ้ง ( talk) 17:45, 5 June 2017 (UTC)
I think this conversation is great, but I also find that a lot of the comments here on the talk page are not approaching one of the biggest conclusions from the WMF team's analysis (note, I work for the WMF, but in a completely different focus/part of the organization): that, at least in their theory, alot of the problem probably lies in "Time-Consuming Judgment Calls".
I am one of those patrollers who skips pages because of the "Time-Consuming Judgment Calls". I frequently will start at the back of the NPP patrol log, and about 30-50 pages in, and about 5 patrolled items later, my experienced editor brain is a bit exhausted with the backlog because:
If I could do any of the following actions with the queue within the NPP tool, my time looking at the article would be more rewarding:
More rewarded, I would be more motivated to patrol more pages. Also, I think these filtering strategies, would create very simple ways to engage more experienced editors in the backlog: either via "we have a lot of popular pages, which we aren't sure if they are any good: lets make sure that they don't put Wikipedia in a bad light" or "There is a huge backlog of Rugby biographies that need to be reviewed! You edit Rugby Biographies: come help!" or "There are a lot of Good faith people out there that need help! Help us grow the community by interacting with them!" Also, the content that would get neglected with these filters, would be low public-interest, low topical relevance, probably neutral contributions (i.e. no risk to our long term public impact).
I think what we have here is a big number, that only looks intimidating and/or bad, because we can't prioritize within that number materials/content that deserves more time to make judgement calls. Even if the backlog grows indefinitely (like many of our other backlogs do), with more filtering folks could hack at the bits that make the most sense to them (I disagree with the proposed auto-expire of the backlog-- backlog ≠ bad thing) . Right now the only reasonably good filters (time, no categories, and deletion) are only good for someone motivated by those particular Wikipedia-wide concerns with the content -- which most of our other tools, programs and activities suggest editors aren't. Sadads ( talk) 20:49, 2 June 2017 (UTC)
What if we had a bot that recorded the top reviewer for each week, and gave them a barnstar for it? This is stupid as hell, but if we want people to actually push buttons, everything tells us that these stupid pseudo "rewards" help with that kind of thing, and maybe could boost morale overall, since the only feedback most reviewers probably get is negative feedback when someone notices they've been doing it wrong. TimothyJosephWood 20:20, 3 June 2017 (UTC)
After reading the comment(s) above, I completely support Timothy's suggestion. I hope you are feeling better today. :-) —usernamekiran (talk) 11:10, 4 June 2017 (UTC)
This report is great. It provides perspective on a complex subject.
On of the pieces of information that I found very informative is the number of page views for some example articles mentioned in the report. Page view numbers help to understand how many people may be affected by potential issues in an article, and at the same time, how much demand there is for such content. I was wondering if those numbers can be used to help reviewers in some way. For example, using them to prioritise the backlogs or just surfacing them in reviewing tools to help reviewers in their assessments. This may be an area worth exploring as work is done in the area of review tools.
-- Pginer-WMF ( talk) 08:06, 5 June 2017 (UTC)
I added a section to the report with stats on reviewer participation -- from Jan 2015 to May 2017, we've got the number of unique reviewers working each month, and the number of reviewed pages. There are some noticeable changes at June 2016 and November 2016. In June, User:SwisterTwister slowed down their participation, which brought the number of reviews down. In November, with the creation of the patroller user right, the number of active reviewers dropped from around 950 per month to 350 per month. I'd be interested to know what you all think about the newly-added stats. DannyH (WMF) ( talk) 20:53, 5 June 2017 (UTC)
Thanks for the report. The key points that I think the reports tries to demonstrate are:
Other interesting points of discussion here:
Here are some of my own opinions and reaction to the report:
I don't believe Wikipedia editors are going to change their convictions about the project based on decree from WMF. In the current environment, we're not going to get consensus on inclusionist-leaning policy changes and imposing such changes will further polarize the community. New NPP reviewers are not going to come charging over the hill. ACTRIAL has demonstrated that we can get consensus on quality-focused policy. WMF is unlikely to abandon the "Anyone can edit" mission and the deletionist sentiment is not strong enough to challenge this.
It seems to me that the way forward is for everyone to get comfortable with a large and growing backlog. The WMF and inclusionists can console ourselves with the knowledge that backlogged pages lose their NOINDEX status after 90-days. ~ Kvng ( talk) 21:16, 5 June 2017 (UTC)
@ DannyH (WMF) and Kvng: Just found out though Boleyn. Its not a lot, but its a good sign. Progress is being made. Soon, we can decrease the backlog markedly. —usernamekiran (talk) 18:32, 6 June 2017 (UTC)
So... how does "the graph" look when you take into account articles created by autopatrolled users prior to their gaining the autopatrolled right? There's been a fairly big push lately to culling the herd in this respect, and fairly mass examining, sorting, and funneling those potentially eligible to PERM, at least it seems that way. I don't actually have statistics on whether there's been an overall marked increase in those having the right as measured by the number of new articles created.
What kind of difference would it make in the backlog if we automatically and retroactively reviewed unreviewed articles by editors when granting autopatrolled? I mean, some of these are by design some of the most prolific article creators on the project. Admittedly, this may effectively raise the bar at PERM, since the responsibility for checking off on potentially scores of articles falls to the button pusher. TimothyJosephWood 21:30, 5 June 2017 (UTC)
I've started a thread at Wikipedia_talk:The_future_of_NPP_and_AfC#Moving_forward to discuss how we can practically move forward with any potential reforms to the NPP process because I feel that might be a better location to discuss the broader questions than on this reports talk page. Comments from anyone who is interested in the discussion would be appreciated. TonyBallioni ( talk) 00:46, 7 June 2017 (UTC)
After reading the report and a lot of the responses, I have a modest proposal that seems to have been hinted at but not flushed out. Why not allow quicker triage - after you read the article, and don't mark it patrolled, move it into one of the sub-buckets described in the report - in a mobile and desktop friendly way.
These three categories can be called "Patrol-pending articles", versus just "unpatrolled". This has the immediate benefit of letting others know that an article has been viewed at least once, and is not an easy article to patrol.
Category 1 goes to a "copy edit required" bucket, for those who like to copy edit but aren't subject matter experts. That gets marked as something like "viewed but not edited" and is no longer part of the new article backlog. Category 2 goes to a "subject matter expert needed" bucket, with corresponding topic categorization, and the talk page is automatically created with a note that experts in that category are being sought. People could sign up for patrolling rights and specify a category they'd like automatic notification on. I am part of the feedback request service, and while it's interesting getting random articles to help with, I'd be better put to use in technology articles. Patrol approved editors can still review the categorized backlog as needed. This would improve engagement. Category 3 is the hardest bucket - those ones require the most time and effort, so the gamification efforts (barnstars, etc.) may work here - the elite of the elite.
Again, with this process, you're crafting the patrolling challenge to match people's unique skill sets and interests, with a goal of fostering engagement.
Another related issue that I haven't seen discussed is articles for deletion. The afd process is related to this, in that it is another barrier to good information appearing on the encyclopedia. I don't need to name names because many of you will already know people like this, but with the higher standards being applied, and the speed at which some people work, many good articles are being deleted. I've personally seen a bias against articles on Indian subjects, likely because of Western unfamiliarity with Indian culture coupled with the limited English of the writers. I have participated in several afd discussions started by the same people and successfully voted to keep the articles, but can only wonder how many good articles have been lost. OK - so here's another not-so-modest modest proposal - I lied. I recommend that we track nominations for deletion, and the success rates the nominators have. If an article is voted keep or no consensus (against the nominee's wishes), that's scored a -1, while a successful nomination (deletion consensus) is a 1. Once someone has a negative total score (50% of their nominations are successfully blocked) they would be capped as to how many articles they could nominate, per some period. I don't have system statistics but I'm sure the sysadmins can figure out what would work. This would minimize disruption, and preserve the goal of keeping good info on the site. Having to be right 1/2 of the time seems fair - otherwise count all the time volunteers spend protecting info from unnecessary challenges as time which could be better spend patrolling new articles, and solving the first problem. Timtempleton ( talk) 17:26, 7 June 2017 (UTC)
We need to rid ourselves of the delusion that writing a new article from scratch is easy, and we need to stop telling that lie to newbies. The truth is that it's hard, even veteran Wikipedians don't simply whack out a perfectly formed complete article in one sitting. From my not inconsiderable experience at AFC I've come to the conclusion that the vast majority of wannabe beginner editors are in fact barely functionally literate, even those whose mother tongue is allegedly English.
I suspect that Visual Editor is to blame for the origin of the massive growth in various backlogs. Before VE the requirement to learn at least the elementary basics of wiki-markup, and the consequent necessity to read and understand various guidelines before one could actually write anything, acted as a barrier to entry that fairly effectively kept out the incompetent. Then VE came along and made it dead easy for anyone to post any crap they felt like without first having to learn anything about Wikipedia. VE is malware, plain and simple. Roger (Dodger67) ( talk) 19:03, 7 June 2017 (UTC)
Having read the above discussion, I have a proposal.
Power~enwiki ( talk) 19:24, 7 June 2017 (UTC)
Something I'd like to bring to everyone's attention is how difficult it is to reduce the backlog as a result of a number of design decisions by the developers of the page curation tool.
The analysis appears to show that reviewers leave the hard cases for last, where they require time-consuming judgment calls. That time would be better spent working on those areas where there is a higher ratio of pages that should be speedily deleted, PROD'ed or sent to AfD. I did not really see anything in the analysis that showed me what reviewers are actually spending their time on. So, as a reviewer, I'd like to provide some insight into how I use my time and how my time can be used more productively by making some changes to the tools and processes that I use.
I can only work at the beginning or the end of a log, and I try to do both; the beginning to catch the really blatant cases that need to be deleted, and at the back of the log because it's interesting and challenging and is a last look to see if we are going to let something sneak by, often those are contributions by paid editors or editors with a COI. The problem is that the back of backlog is so massive that it is impossible to reach. If I look at page curation now, the oldest pages are from 2007, old redirects that haven't spent all that time in the queue. The cases that have been sitting there all the time are from 29 December, 163 days ago. To access a page, using the page curation tool, that is actually at risk of becoming indexed, rewarding the spammers, I would need to scroll to May 12 or thereabout and it would take me about half an hour to do so, if I didn't know to go to Special:NewPages and set the offset manually to 20170512000000 where I can find those pages that are about to get indexed. Pages are only listed at NewPages for 30 days though, so to get at any of the unreviewed pages between 30 and 160 days old, you would have to resort to other tricks.
I can't say what other reviewers have figured out, but I am, for all intents and purposes completely unable to work on the "middle" of the backlog. In my experience, the pages that are 150 days old are no more difficult to review than pages that are 30 days old. They have not been skipped endlessly. They have only been skipped for the first 30 days. After that, they cannot be accessed, and so they are just sitting there until they reappear at the end of the backlog.
Another problem that I face are the varying notability standards that the different projects have developed. I'm pretty familiar with art-related stuff, but know nothing about sports. So I much prefer to review articles that cover a subject that I care about, and where I have access to sources as well as a good sens of which sources are reliable. So I do skip a lot of wrestling articles because I can't tell all the different wresting organizations apart. A lot of my time goes to skipping stuff I don't care about. To work around that problem, I look at suggestions from InceptionBot, which gives me project-related topics that I do care about. I work faster, I'm happier, and I contribute more reviews when I can work in areas where I have expertise.
Another time-sink for me is performing manual tasks that could be automated. Checking what links here to see if something is an orphan when the page curation tool already shows me that an article is an orphan seems unnecessary. I should not have to tag an article as missing references when the page curation tool already has that information. It would be nice to preload the uncategorised, orphan, unreferenced tags if an article is already listed in the New pages feed with No categories, Orphan and No citations.
Checking recreations of deleted articles when dealing with spammers is another manual task that is taking a lot of time. Maybe there is a better way to do this, but bringing up the deletion log is time-consuming. Automated flagging of recreations would be helpful. For checking copyvios, I have added stuff to my toolbar that loads Earwig's tool with the page I'm reviewing to make that a bit faster, but it would be nice if that too had been done already.
In summary: The page curation tool would already serve my needs (and possibly others) much better if only two improvements could be made: a filter for pages that I have already skipped, and a configurable offset date.
Thanks for listening, and thanks for all your hard work Mduvekot ( talk) 23:11, 10 June 2017 (UTC)
I agree with and appreciate Mduvekot's post. Like DGG, I would really like a keyword search, I could be much more effective. A third opton between oldest and newest would be great too. I'd rather prioritise those not brand new and still being worked on and not yet indexed. Boleyn ( talk) 05:25, 12 June 2017 (UTC)
If (article age=30d & number of extended confirmed editors >=3 & number of references [including non-social media external links] >=2 & (article not an obvious copyvio | G4 candidate) & article has at least one tag) then mark as reviewed
What we are doing (or should be doing), Mduvekot, ever since we got the Draft namespace created (and that's why we wanted it) is:
and we've been having the conversation about the 'move to draft' tool since first suggested here 13 December 2015 by czar and listed at Phab by Czar on Jan 22 2016,, but despite being requested several times, and listed here 9 months ago, the devs are not doing anything about it. Kudpung กุดผึ้ง ( talk) 15:45, 13 June 2017 (UTC)
I saw this in the Economist and thought it might be interesting to Wikipedia leadership. It's a discussion of how to encourage peer review, tangentially related to clearing the NPP backlog.[ [2]]. Timtempleton ( talk) 14:41, 12 June 2017 (UTC)
I did some digging around after coming across this question on Quora. I came to a conclusion that only about 16.7% articles on Wikipedia have been marked as reviewed. Please see the calculations here. For those too lazy, there are about 5,422,965 [1] articles on Wikipedia and only about 905695 [2] have been marked as reviewed. Don't know if it helps but thought it wise to share it here as well. Yashovardhan ( talk) 19:14, 12 June 2017 (UTC)
References
{{
cite web}}
: |author=
has generic name (
help)
1. In the section 'Non-autoconfirmed contributors', the author writes "According to our calculations, there's an average of 1,180 new mainspace articles created on English Wikipedia every day,...",. The graph shows page review backlog, and for only six months. The growth of the backlog is merely the symptom of potentially many other interrelated factors. Would the WMF please provide graphs/query data that include the past 18 months of data, showing the following?
This would give us useful insight to new article creation trends, reviewing trends, and article quality trends. It will be far more useful than showing the overall backlog. Thank you for your help.- Mr X 12:59, 15 June 2017 (UTC)
2. In the same section, a conclusion is made:
"While removing pages created by non-autoconfirmed users would reduce the burden on that first wave of reviewers, it would result in the loss of many potential good articles. It would also send a clear message to new Wikipedia editors that their contributions aren't wanted, potentially stunting the growth of the editing community. Most importantly, it wouldn't actually solve the problem of the growing New Page Review backlog."
This seems to be built upon several false or dubious assumptions, specifically
"Pages created by non-autoconfirmed user would be removed"- No, users would be greeted with a message telling them there are several options for creating an article. For example, that they can create a freeform draft, which they can ask an editor to review. If it is ready, the editor will publish it for them.
"It would result in the loss of many potential good articles"- No, 80% of new article by new contributors are junk. The other 20% would not be lost at all. Evidence is required that there would be a loss of many potential good articles, as that is a rather extraordinary claim.
"It would also send a clear message to new Wikipedia editors that their contributions aren't wanted"- No, it would send a clear message to new contributors creating the 80% junk articles that we have standards, and to the new contributors creating the 20% usable article that their contributions are welcome but, that they must follow one of the processes before their article can be published. Isn't this exactly what we do for image uploading?
"It wouldn't actually solve the problem of the growing New Page Review backlog"- Of course it would. It would free page patrollers to review the rest of the backlog and it would disuade spammers and vandals from co-opting the encyclopedia for their non-encyclopedic purposes.
- Mr X 22:49, 15 June 2017 (UTC)
For those who have not yet seen it, we have some numbers based on deletions from the WMF. I've produced the charts based on them. This is based on taking the articles created the first week of November 2016 and checking their status as of 14 June 2017. You can see the data at User:MusikAnimal (WMF)/NPP analysis. WMF is working on getting us more numbers on this, but these also give us a snapshot of what has happened to articles that have all been reviewed. TonyBallioni ( talk) 13:14, 15 June 2017 (UTC)
Hi all,
as planned earlier and also suggested above by Innisfree987 and others, we have produced a different version of the backlog snapshot graph. To recap the explanation posted earlier, the version of the graph used in the report was showing the number of still unreviewed articles per day as it appeared to new page patrollers at Special:NewPagesFeed at one point in time (May 25), using the same definition of "new editors" as patrollers see there (i.e. if you change settings there to only show unreviewed articles by "new editors", you would get a list that exactly corresponds to the blue part of the graph). The new graph instead uses the autoconfirmed status at the time of the article's creation, because that's more pertinent to the question of how much disabling article creation for non-autoconfirmed users might reduce the backlog:
It confirms the earlier observation that pages by non-autoconfirmed users (by either definition) make up only a small portion of the backlog - to be exact, just 15% of the still unreviewed pages came from users who had not been autoconfirmed when they created the article.
Regards, Tbayer (WMF) ( talk) 22:04, 15 June 2017 (UTC)
Hello! After a few days of coding I think I have a months worth of rich data that I hope will answer some questions. There are a LOT of charts I want to share, so instead of further clogging up this page I'm going to link you to User:MusikAnimal (WMF)/NPP analysis. The data you see there right now covers February 15 to March 15 2017, which should (seemingly) line up with the abrupt upward trend on Kudpung's NPP backlog chart.
No matter what you make of the data, I've got some exciting things to tell you. Thanks to the incredible work of Yurik with mw:Extension:Graph, the charts you see will automatically update as new data comes in. They can all be shown using the {{ User:MusikAnimal (WMF)/NPPChart}} template (which I will document fully). You can use this template anywhere and the charts will automatically update. Eventually I hope to have many months, even years, of data. When we get to that point the charts will magically become interactive and you can zoom in/out, etc. For now they are static because I wanted to get the numbers to you quickly, so I didn't put too much effort into it :) I also realize the pie charts don't actually show any numbers, something I hope to fix tomorrow, and I'll also throw in percentages.
One of the things people have been complaining about is how hard it is to get this data. Well, since these charts automatically update when new data is supplied, we just need to automate generating that data. I have the script to do it, I just need bot approval and some rough community consensus. Let me know if you think this is a good idea. My thoughts are to first backfill as much data as we can, then have the bot automatically add data in everyday. That way you always have rich and up-to-date numbers at your disposal. I'd also like to move {{ User:MusikAnimal (WMF)/NPPChart}} to Template:NPPChart but I first want to see what you all think.
Now, the chart you all probably want to see the most:
Graphs are unavailable due to technical issues. There is more info on Phabricator and on MediaWiki.org. |
...which is produced by {{
User:MusikAnimal (WMF)/NPPChart|type=line|usergroup=non-autoconfirmed|metric=survival}}
Maybe this is suggestive that ACTRIAL has some merit. For the record, I'm indifferent and am more committed to just bringing you the data. The only thing I'll be sad about is half of my charts will no longer be relevant because there will be no new pages created by non-autoconfirmed users – but that's not a real argument but rather your common engineer frustration ;) We can still make use of the other charts.
One thing I did want to point out, however, is that while we have data on the types of deletions (speedy vs PROD vs AfD), we don't have data on the types of speedy deletions. I'd like to do that next, and will share a chart for it once I have the data. What I'm getting at is probably much of the speedy deletions from new users are vandalism-related. I'm not outright dismissing ACTRIAL, but I think it's worth noting vandals will be vandals, and if they can't do it through new pages they'll likely target existing ones. This is opposed to new users who don't understand inclusion criteria. The ratio of those versus vandals is something I'd personally like to see.
Let me know what you think of the charts, and while discussion ensues I'll keep my script running to backfill more data, and they will all magically update. Again let me formally ask for feedback about the idea of a bot. I'd love to make that happen so we don't have to go through this long process again. With the bot, you won't need to request the data, it will just be there, right here on the wiki :)
Warm regards, MusikAnimal (WMF) ( talk) 01:54, 16 June 2017 (UTC)
Graphs are unavailable due to technical issues. There is more info on Phabricator and on MediaWiki.org. |
Extended content
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@ DannyH (WMF) and Joe Roe: I am not particularly convinced that the process has changed all that much for those of us who do it: I needed the reminder of the backlog to come back to participate not a change in the documentation to return -- its more of a social engagement and organization problem (there are tons of admins who also, by default, have the right: but don't use it). I think there are other things that have been identified by the conversations here:
I hope that analysis of what I am hearing helps. Also, I want to note like I did higher on the page: I work for WMF, but have not engaged with or supported this team working on this project during this project as part of my work -- my scope and focus is on WP:GLAM. Sadads ( talk) 21:01, 16 June 2017 (UTC)
Month | Backlog on day 1 | Articles created | Articles reviewed | Backlog on last day | Articles deleted | Speedy | ||
---|---|---|---|---|---|---|---|---|
March | Number | |||||||
April | Number | |||||||
May | Number |
Myself, I have auto-patrolled status and so am able to create a steady stream of new articles without much interference but new editors have much more difficulty. Here are three cases which I have noticed recently:
1. Ashley Hannah was one of a large group of students who attended an outreach event at Imperial College recently. She was bright and enthusiastic and wanted advice about how to add an image to the subject that she was working upon – Sue Gibson (chemist). I showed her a way of doing this but it wasn't working. The feedback from the interface wasn't clear but my impression is that this was because she wasn't autoconfirmed and so file uploading was blocked. I could have asked her to make 10 edits but, to be autoconfirmed, four days have to elapse and she had just created her account at this one-day event. So, I wasn't able to resolve her difficulty and this was quite frustrating. She got some acclaim for her first efforts but will have been left with the impression that adding an image to Wikipedia is significantly more difficult than in Instagram, Snapchat or Twitter (the most popular apps for teens). Why do we have a four-day cooling-off period when this is so clearly an obstacle for outreach events?
2. Henrietta999 is an experienced author who spoke at another recent editathon. Her first article on Wikipedia was Margaret, Lady Moir and that went reasonably smoothly but that was perhaps too smooth as I have the impression that she'd have liked more feedback, especially some appreciation for the effort. Her second article, Margaret Dorothea Rowbotham, has not been going so well because it was found by a new page patroller who has been nit-picking and tag-bombing. This has not been well-received and the editor has withdrawn from the topic. Rather than collaboration, we have conflict – a familiar difficulty throughout Wikipedia.
3. Carolineneil has been creating a stream of impressively erudite chemistry articles from Asymmetric addition of dialkylzinc compounds to aldehydes to Use of pi,pi, CH-pi and pi-cation interactions in supramolecular assembly. These have been going through AfC and the results seem to have been reasonably productive. But the editor is quite introverted here and is getting some stick at ANI for not communicating and there's currently a proposal to block them to attract their attention. Are they unfamiliar with our ways of communicating, indifferent to them or repelled by them? Should we be looking a gift horse in the mouth?
These examples seem to confirm what was said in Encyclopedia Frown
“The encyclopedia that anyone can edit” is at risk of becoming, in computer scientist Aaron Halfaker’s words, “the encyclopedia that anyone who understands the norms, socializes him or herself, dodges the impersonal wall of semiautomated rejection and still wants to voluntarily contribute his or her time and energy can edit.”
I support the WMF's recommendation that we should try to keep the project open and accessible. Andrew D. ( talk) 10:19, 18 June 2017 (UTC)
Here's some more recent examples. They start with a case that I just came across while patrolling. The article was children in emergencies and conflicts and was at risk of imminent deletion because it had been proposed for deletion more than seven days before. I did not hesitate to remove the prod because this is quite a notable topic – it's easy to find entire books about it – and its treatment in this case was quite scholarly and well-supported by citations. After removing the prod, I looked to see who had placed it. This was not a vandal or inexperienced editor, but DGG – a veteran patroller and illustrious member of Arbcom. Now, my view is that DGG's action was quite contrary to the WP:PROD process which states clearly that it is for "for uncontroversial deletion. ... PROD must only be used if no opposition to the deletion is expected". It seemed quite inappropriate in this case because DGG said himself, "I don't know if it is fixable." It seems remarkable that such an experienced editor could act like this and so we seem to have the problem of quis custodiet ipsos custodes?. I caught this one but how many other respectable, substantial articles are being casually deleted in this way?
The creator of this article, A.mart82, is working with a Wikimedian in Residence on such topics in partnership with UNESCO. They are a new editor and notice that they created other substantial articles such as child development in Africa just one day after starting editing and so before they were autoconfirmed. Notice also that their talk page just contains templated warnings, including other deletion proposals. Isn't this a remarkably bitey way to treat someone who is working with one of the foremost cultural organisations on the planet? My view is that the WMF should encourage and support such work. If the NPP is unable to treat such work with consideration and respect, then the WMF should perhaps limit their powers rather than augmenting them?
Of course, I might be missing some aspect of this and so I encourage the parties involved to explain themselves and their actions. This will then fully inform our discussion. Andrew D. ( talk) 22:55, 24 June 2017 (UTC)
There are many controversial claims in this study that are not backed up by any evidence:
Secondly, no serious attempt is made to judge the quality of articles created by non-autoconfirmed users. Per MusikAnimal's subsequent analysis, we see that a full 3/4 of articles created by non-autoconfirmed users are eventually deleted. Why are we fighting so hard to keep allowing new users to create this crap and publish it in mainspace? Restricting non-autoconfirmed users from creating articles would not only significantly reduce the number of articles that need to be reviewed, but it would also vastly increase the average quality of articles that reviewers are reviewing. This would change the whole process of reviewing. Instead of it being the current war against the relentless diarrhea emanating from the depths of humanity, it could become a more constructive process with an emphasis on improving articles and educating serious editors.
Thirdly, remember that ACTRIAL was designed to be a temporary trial. There is only one way to definitely determine the effect of ACTRIAL on the backlog, article quality, and editor retention: implement the trial for a temporary period, and see what happens. We can do more studies and argue about it until the cows come home, but we won't come any closer to understanding the true effect unless we actually do the trial. Even if ACTRIAL turns out to be a miserable failure, it would only be a failure for a short time, and the damage that it could do to the project would be minimal. ‑Scottywong | babble _ 06:56, 6 July 2017 (UTC)