![]() | This page is currently inactive and is retained for
historical reference. Either the page is no longer relevant or consensus on its purpose has become unclear. To revive discussion, seek broader input via a forum such as the village pump. |
Wpfsck is an application written by Triddle and Andrew Rodland which scans the English Wikipedia for errors and inconsistencies. The program is written in Perl and takes its name from the Unix fsck utility. Currently the program can generate reports for WikiProject stubsensor, Most wanted stubs, and Multiple redirects in about 40 minutes on an 800 MHz PowerPC G4.
At its core wpfsck is an extensible architecture built around the concept of cleanup projects and designed specifically with Wikipedia in mind. Because of this additional cleanup projects can be added easily and is encouraged. If you have an idea for a systematic cleanup project please leave a note at the #Comments section. If you currently run a cleanup project you may wish to consider consolidating with this project; see #Consolidation.
The stubsensor project attempts to programatically identify articles that have grown beyond a stub but still have their stub tag. The version of Stubsensor in wpfsck features new statistical analysis and bayesian filtering techniques to identify the offending stubs. It is interesting to note that this new stubsensor identified articles that the original Stubsensor missed, even from the same database dump. This shows a lot of promise for this new technique. The top 10 stubs from this report are:
Double redirects occur frequently but are easy to detect and fix.
The Most wanted stubs report gives the list of stubs with the highest number of links to them. This list is ordered with the largest number of links at the top. Here are the top 10 as generated by wpfsck:
Consolidation of cleanup projects may make sense in some circumstances:
Even if you don't want to consolidate you may find the Perl module at the heart of wpfsck, Parse::MediaWikiDump, useful . You may also wish to run your own copy of wpfsck if you perform many cleanup projects.
![]() | This page is currently inactive and is retained for
historical reference. Either the page is no longer relevant or consensus on its purpose has become unclear. To revive discussion, seek broader input via a forum such as the village pump. |
Wpfsck is an application written by Triddle and Andrew Rodland which scans the English Wikipedia for errors and inconsistencies. The program is written in Perl and takes its name from the Unix fsck utility. Currently the program can generate reports for WikiProject stubsensor, Most wanted stubs, and Multiple redirects in about 40 minutes on an 800 MHz PowerPC G4.
At its core wpfsck is an extensible architecture built around the concept of cleanup projects and designed specifically with Wikipedia in mind. Because of this additional cleanup projects can be added easily and is encouraged. If you have an idea for a systematic cleanup project please leave a note at the #Comments section. If you currently run a cleanup project you may wish to consider consolidating with this project; see #Consolidation.
The stubsensor project attempts to programatically identify articles that have grown beyond a stub but still have their stub tag. The version of Stubsensor in wpfsck features new statistical analysis and bayesian filtering techniques to identify the offending stubs. It is interesting to note that this new stubsensor identified articles that the original Stubsensor missed, even from the same database dump. This shows a lot of promise for this new technique. The top 10 stubs from this report are:
Double redirects occur frequently but are easy to detect and fix.
The Most wanted stubs report gives the list of stubs with the highest number of links to them. This list is ordered with the largest number of links at the top. Here are the top 10 as generated by wpfsck:
Consolidation of cleanup projects may make sense in some circumstances:
Even if you don't want to consolidate you may find the Perl module at the heart of wpfsck, Parse::MediaWikiDump, useful . You may also wish to run your own copy of wpfsck if you perform many cleanup projects.