Operator: Vacation9 ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 02:56, Thursday January 31, 2013 ( UTC)
Automatic, Supervised, or Manual: Supervised
Programming language(s): AutoWikiBrowser, Python
Source code available: AWB, Standard pywikipedia
Function overview: Replace substitutes for Romanian letters used before Unicode 3 was released (Ş, ş, Ţ, and ţ) with their proper letters in the Romanian alphabet: (Ș, ș, Ț, and ț). It will replace everywhere except for image links and interwiki/external links.
Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 52#Romanian_orthography
Edit period(s): One time run
Estimated number of pages affected: Hundreds of thousands. Working off the Geography of Romania Category
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: From a database scan of pages with the characters in their titles list of pages in categories related to Romanian Geography, move them to their correct name if they contain incorrect characters. Then, replace the characters defined above with their correct letters in the Romanian alphabet and fix the double redirects created.. The input pages will only be strictly Romanian, since the current letters are correct in non-Romanian languages. The input pages won't just be taken from the base category (Geography of Romania) but will be sub-categories recursed by AWB that are manually checked.
I believe this must be supervised, not automatic. -- MZMcBride ( talk) 03:52, 31 January 2013 (UTC) reply
Trial complete. Completed a total of 50 edits in both page moves and AWB replacements. The reason it took so long is because I was dealing with some problems with AWB not skipping links along with unicode errors. I have got everything running smoothly now however. Here is my planned workflow: 1. Using a python script I created, get a list of pages from categories (manually reviewed categories) relating to Romanian geography. 2. Run these pages in another python script I created which takes the page names in and performs a page move with the correct characters. Sample moves using this script are here:
[2]
[3]
[4]. 3. In the same script, VoxelBot automatically, using Backlinks (like WhatLinksHere but through the API), corrects links to the page. Examples are here:
[5]
[6]
[7]
[8]. 4. Using AWB (thanks Addshore
), from a list of manually reviewed categories like before, replace all other instances of the incorrect characters with correct ones (ignoring external/internal links and templates, but not notes like before (using a custom regex)). Examples:
[9]
[10]
[11]
[12]. General fixes are run as well. All of the steps run by script or AWB will be supervised, but not manual.
Vaca
tion
9 22:15, 13 February 2013 (UTC)
reply
Operator: Vacation9 ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 02:56, Thursday January 31, 2013 ( UTC)
Automatic, Supervised, or Manual: Supervised
Programming language(s): AutoWikiBrowser, Python
Source code available: AWB, Standard pywikipedia
Function overview: Replace substitutes for Romanian letters used before Unicode 3 was released (Ş, ş, Ţ, and ţ) with their proper letters in the Romanian alphabet: (Ș, ș, Ț, and ț). It will replace everywhere except for image links and interwiki/external links.
Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 52#Romanian_orthography
Edit period(s): One time run
Estimated number of pages affected: Hundreds of thousands. Working off the Geography of Romania Category
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: From a database scan of pages with the characters in their titles list of pages in categories related to Romanian Geography, move them to their correct name if they contain incorrect characters. Then, replace the characters defined above with their correct letters in the Romanian alphabet and fix the double redirects created.. The input pages will only be strictly Romanian, since the current letters are correct in non-Romanian languages. The input pages won't just be taken from the base category (Geography of Romania) but will be sub-categories recursed by AWB that are manually checked.
I believe this must be supervised, not automatic. -- MZMcBride ( talk) 03:52, 31 January 2013 (UTC) reply
Trial complete. Completed a total of 50 edits in both page moves and AWB replacements. The reason it took so long is because I was dealing with some problems with AWB not skipping links along with unicode errors. I have got everything running smoothly now however. Here is my planned workflow: 1. Using a python script I created, get a list of pages from categories (manually reviewed categories) relating to Romanian geography. 2. Run these pages in another python script I created which takes the page names in and performs a page move with the correct characters. Sample moves using this script are here:
[2]
[3]
[4]. 3. In the same script, VoxelBot automatically, using Backlinks (like WhatLinksHere but through the API), corrects links to the page. Examples are here:
[5]
[6]
[7]
[8]. 4. Using AWB (thanks Addshore
), from a list of manually reviewed categories like before, replace all other instances of the incorrect characters with correct ones (ignoring external/internal links and templates, but not notes like before (using a custom regex)). Examples:
[9]
[10]
[11]
[12]. General fixes are run as well. All of the steps run by script or AWB will be supervised, but not manual.
Vaca
tion
9 22:15, 13 February 2013 (UTC)
reply