Operator: Zackmann08 ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 20:49, Thursday, December 15, 2016 ( UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): Ruby
Source code available: User:ZackBot/single_cleanup
Function overview: Remove deprecated {{{Certification}}}
param from {{
Infobox single}}
Links to relevant discussions (where appropriate): RfC: Should the "Certification" field be removed from Infobox single?, Template-protected edit request on 15 December 2016 & Use of ZackBot for Infobox single
Edit period(s): One time run
Estimated number of pages affected: All pages in Category:Pages using Infobox single with deprecated certification parameter (0)
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): yes
Function details: Goes through each transclusion that has been added to the category and removes the offending line of code.
There is an interest in standardizing and streamlining music templates. In recent discussions
here and
here, the {{{Certification}}}
parameter was brought up by several editors.
An RfC was opened and the consensus (unanimous) was to remove Certifications. Since the parameter appears on 3,500+ pages, it seems like a good task for a bot. ZackBot has performed similar tasks and a bot would greatly reduce the time to implement the change. —
Ojorojo (
talk)
23:32, 15 December 2016 (UTC)
reply
/\|\s*Certification\s*=.*\n/
to capture the certification parameter and value, replacing it with a blank string. What if the next parameter was on the same line
[1], which is valid syntax? Then that parameter and value would also get removed. Or what if the certification parameter was lowercased
[2]? I can help if you need :) I will also try to look over the infobox regex —
MusikAnimal
talk
07:02, 16 December 2016 (UTC)
reply/\|\s*[Cc]ertification\s*=.*\n/
but it is worth noting that {{{certification}}}
is not a valid parameter and is not tracked by the category anyway. --
Zackmann08 (
Talk to me/
What I been doing)
16:17, 16 December 2016 (UTC)
reply
gsub
in conjunction. I think
this is a well-rounded example. Note the /i
to make the regex case insensitive. So the code would be something like updated_text.gsub!(/(\n?\s*\|\s*Certification\s*=.*?)(?:\n|\|)/i)
. The .*?
makes the selection a non-greedy match, otherwise it may capture everything up until the end of a similar pattern
[3] (the likelihood of a duplicate certification parameter is slim, but still). You'll almost always want to do non-greedy matching. Finally, (?:\n|\|)
means go until a new line or another pipe character is found, but don't include it as a group.
Rubular is an excellent tool where we can build test cases to ensure all scenarios are covered. After this task (and task 4) I bet you'll have some re-usable regex for future tasks. Allow me to go over the infobox regex too, as a safeguard, before we begin a trial. In the meantime feel free to do a limited "dry run" as I explained in
Task 4. This will help identify issues before any edits are made. Thanks for making your code open source! —
MusikAnimal
talk
18:23, 16 December 2016 (UTC)
replyAmendment: Notice the first capture in
[4] stops at |Platinum]]
, since there is a pipe in the wikilink! I've created a much more complicated solution
here. This also preserves the template structure, removing any extraneous new lines. I can explain it over IRC, if you'd like. Parsing wikitext is obviously not fun :/ But the important thing is try to handle all possible scenarios, and again Rubular is your best friend for this :) —
MusikAnimal
talk
18:54, 16 December 2016 (UTC)
reply
@ MusikAnimal: you freaking ROCK!!!! I'm swamped at my day job right now so I won't have a chance to really look at this until this afternoon... Let me ping you on IRC then and we can get a good solution. I love your tips on the regex... I'm still learning some of the complex features of regular expressions and really appreciate your help. @ Ojorojo: no need to worry about the word "Certification" appearing elsewhere in the body. My code specifically ONLY looks inside of the infobox for the word. -- Zackmann08 ( Talk to me/ What I been doing) 19:10, 16 December 2016 (UTC) reply
sleep 3
), so we aren't unnecessarily going at full speed with potential issues. With this the trial will only take around two and a half minutes —
MusikAnimal
talk
21:15, 17 December 2016 (UTC)
reply
@ Zackmann08: Is the trial complete? Did you notice any errors? Before we go to the next step I want to propose something. Pinging Ojorojo as well. We're down to just pages that have some sort of template in the Certification field of the infobox. Reviewing them, I found some like Barrette (song) that have the certifications only listed in the infobox, and the information is sourced. As I mentioned before, this means the bot would be removing sourced content that does not exist anywhere else in the article. So my suggestion to Zackmann08 for the next run is to maybe look for the words "certification" or "certified" (case insensitive) anywhere outside the infobox, and if nothing is found, skip that page. Hopefully the remaining pages won't be as abundant, and we can fix them by hand. I personally feel it's worth the bit of effort to move the content to the body, which will involve a little copy editing. Someone took the time to add that sourced information, doesn't seem fair to deprecate the template parameter and remove the content if it can be salvaged. Thoughts? — MusikAnimal talk 23:18, 19 December 2016 (UTC) reply
@ Ojorojo: I think you underestimate how complicated it is for the bot to do what you are asking... If you are planning to just look over every single one of the remaining edits the bot would make I would suggest that you simply do the edits by hand. -- Zackmann08 ( Talk to me/ What I been doing) 18:23, 21 December 2016 (UTC) reply
Approved. Per above. The remaining pages will be done by hand, so I think we're all done here. Happy holidays! —
MusikAnimal
talk
22:37, 21 December 2016 (UTC)
reply
Operator: Zackmann08 ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 20:49, Thursday, December 15, 2016 ( UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): Ruby
Source code available: User:ZackBot/single_cleanup
Function overview: Remove deprecated {{{Certification}}}
param from {{
Infobox single}}
Links to relevant discussions (where appropriate): RfC: Should the "Certification" field be removed from Infobox single?, Template-protected edit request on 15 December 2016 & Use of ZackBot for Infobox single
Edit period(s): One time run
Estimated number of pages affected: All pages in Category:Pages using Infobox single with deprecated certification parameter (0)
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): yes
Function details: Goes through each transclusion that has been added to the category and removes the offending line of code.
There is an interest in standardizing and streamlining music templates. In recent discussions
here and
here, the {{{Certification}}}
parameter was brought up by several editors.
An RfC was opened and the consensus (unanimous) was to remove Certifications. Since the parameter appears on 3,500+ pages, it seems like a good task for a bot. ZackBot has performed similar tasks and a bot would greatly reduce the time to implement the change. —
Ojorojo (
talk)
23:32, 15 December 2016 (UTC)
reply
/\|\s*Certification\s*=.*\n/
to capture the certification parameter and value, replacing it with a blank string. What if the next parameter was on the same line
[1], which is valid syntax? Then that parameter and value would also get removed. Or what if the certification parameter was lowercased
[2]? I can help if you need :) I will also try to look over the infobox regex —
MusikAnimal
talk
07:02, 16 December 2016 (UTC)
reply/\|\s*[Cc]ertification\s*=.*\n/
but it is worth noting that {{{certification}}}
is not a valid parameter and is not tracked by the category anyway. --
Zackmann08 (
Talk to me/
What I been doing)
16:17, 16 December 2016 (UTC)
reply
gsub
in conjunction. I think
this is a well-rounded example. Note the /i
to make the regex case insensitive. So the code would be something like updated_text.gsub!(/(\n?\s*\|\s*Certification\s*=.*?)(?:\n|\|)/i)
. The .*?
makes the selection a non-greedy match, otherwise it may capture everything up until the end of a similar pattern
[3] (the likelihood of a duplicate certification parameter is slim, but still). You'll almost always want to do non-greedy matching. Finally, (?:\n|\|)
means go until a new line or another pipe character is found, but don't include it as a group.
Rubular is an excellent tool where we can build test cases to ensure all scenarios are covered. After this task (and task 4) I bet you'll have some re-usable regex for future tasks. Allow me to go over the infobox regex too, as a safeguard, before we begin a trial. In the meantime feel free to do a limited "dry run" as I explained in
Task 4. This will help identify issues before any edits are made. Thanks for making your code open source! —
MusikAnimal
talk
18:23, 16 December 2016 (UTC)
replyAmendment: Notice the first capture in
[4] stops at |Platinum]]
, since there is a pipe in the wikilink! I've created a much more complicated solution
here. This also preserves the template structure, removing any extraneous new lines. I can explain it over IRC, if you'd like. Parsing wikitext is obviously not fun :/ But the important thing is try to handle all possible scenarios, and again Rubular is your best friend for this :) —
MusikAnimal
talk
18:54, 16 December 2016 (UTC)
reply
@ MusikAnimal: you freaking ROCK!!!! I'm swamped at my day job right now so I won't have a chance to really look at this until this afternoon... Let me ping you on IRC then and we can get a good solution. I love your tips on the regex... I'm still learning some of the complex features of regular expressions and really appreciate your help. @ Ojorojo: no need to worry about the word "Certification" appearing elsewhere in the body. My code specifically ONLY looks inside of the infobox for the word. -- Zackmann08 ( Talk to me/ What I been doing) 19:10, 16 December 2016 (UTC) reply
sleep 3
), so we aren't unnecessarily going at full speed with potential issues. With this the trial will only take around two and a half minutes —
MusikAnimal
talk
21:15, 17 December 2016 (UTC)
reply
@ Zackmann08: Is the trial complete? Did you notice any errors? Before we go to the next step I want to propose something. Pinging Ojorojo as well. We're down to just pages that have some sort of template in the Certification field of the infobox. Reviewing them, I found some like Barrette (song) that have the certifications only listed in the infobox, and the information is sourced. As I mentioned before, this means the bot would be removing sourced content that does not exist anywhere else in the article. So my suggestion to Zackmann08 for the next run is to maybe look for the words "certification" or "certified" (case insensitive) anywhere outside the infobox, and if nothing is found, skip that page. Hopefully the remaining pages won't be as abundant, and we can fix them by hand. I personally feel it's worth the bit of effort to move the content to the body, which will involve a little copy editing. Someone took the time to add that sourced information, doesn't seem fair to deprecate the template parameter and remove the content if it can be salvaged. Thoughts? — MusikAnimal talk 23:18, 19 December 2016 (UTC) reply
@ Ojorojo: I think you underestimate how complicated it is for the bot to do what you are asking... If you are planning to just look over every single one of the remaining edits the bot would make I would suggest that you simply do the edits by hand. -- Zackmann08 ( Talk to me/ What I been doing) 18:23, 21 December 2016 (UTC) reply
Approved. Per above. The remaining pages will be done by hand, so I think we're all done here. Happy holidays! —
MusikAnimal
talk
22:37, 21 December 2016 (UTC)
reply