Operator: Usernamekiran ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 18:02, Thursday, October 19, 2023 ( UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): pywikibot
Source code available: github
Function overview: archive entries/blurbs from Template:In the news
Links to relevant discussions (where appropriate): requested at BOTREQ, and further discussion at Wikipedia talk:In the news.
Edit period(s): once per day
Estimated number of pages affected: archive bot, will create one page per month
Exclusion compliant (Yes/No): No
Already has a bot flag (Yes/No): Yes
Function details: The bot goes through the edit revisions/diffs of Template:In the news, and archives the entries that have been added.
*[[
or * [[
then the bot considers it as a recent death. If it begins with <!--
, it considers the entry as news/event.|
(if there is a white-space after the pipe), and [[Image:
In the first run, the bot will archive all the entries starting from March 2004. That would be around 40,000 edits. After that I will setup a daily cronjob to go through latest 100 edits. The current average is around 50 edits per day for the ITN template since creation, and 7 edits per day since last 8 years, but we should keep it 100 revisions for "just in case". I have already implemented a check in program based on revision ID/diff, so there would be no repeated archrivals.
Kindly feel free to ask any questions/doubts you have. Regards, —usernamekiran (talk) 18:02, 19 October 2023 (UTC) reply
PS: I will post links to sandbox/trial edits shortly. —usernamekiran
(talk) 18:09, 19 October 2023 (UTC)
reply
Approved for trial (50 edits or 30 days, whichever happens first). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac ( talk) 09:20, 24 October 2023 (UTC) reply
Trial complete. The bot created the archive page Wikipedia:In the news/Featured/February 2004 (for consistency with ITN candidates archive at Wikipedia:In the news/Candidates/April 2005). Everything worked as expected. Requesting an extended trial with 2,000 edits. —usernamekiran (talk) 13:49, 24 October 2023 (UTC) reply
BAG assistance needed The trial went as expected, and new functionality of updating index worked as expected
in sandbox. Is it possible to get this task approved? —usernamekiran
(talk) 13:09, 31 October 2023 (UTC)
reply
{{ BAG assistance needed}} based on the discussion at WT:ITN#Archive of ITN postings, no changes were made to the original program except for changing the target location for archive pages. requesting extended trial for 500 edits. —usernamekiran (talk) 02:40, 17 November 2023 (UTC) reply
{{Wikipedia:In the news/Featured/Archives/header}}
. The header for all the candidates archive page is same (
Wikipedia:In the news/Candidates/November 2023), but in our case, there is one extra line: "The relevant discussions for additions of entries to the Wikipedia talk:In the news, kindly see Wikipedia:In the news/Candidates/November 2023, or the previous month's page thereof." I am not sure how can we come up with something so that the header could be updated from single place/edit, as even if we use substitution somehow, it wouldnt be editable from single location in the future. Kindly let me know if you have any suggestions/ideas regarding that. (2) Regarding the edits in rapid succession: I was having a lot of difficulties because of the way mediawiki presents diffs in the html format. eg, even if a single word (eg death toll) is updated from 10 to 15, then the diff is presented as the original line completely removed, and an entire new line with the updated word as a totally new line. To circumvent this issue − and to avoid repeated entries, the bot relies on diff IDs already present in the archive page. Also, in current days, the ITN template gets edited/updated around 5 to 7 times in a day. So, in the first run, bot will archive all the entries (around 40,000), and then it will run everyday to archive the new 5 to 7 entries. In short: appending all the changes in single edit for a single run is not feasible (also difficult, as lot of entries do not go in same months, eg a blurb being added on 29 Nov, and being removed on 2 Dec). This can be resolved by adding a delay of 3 (or 5) seconds between each save operation. —usernamekiran (talk) 17:03, 17 November 2023 (UTC) reply
Trial complete. @ The Earwig: Hello. I ran the original program, and User:KiranBOT/sandbox/Posted/February 2004 was created. With another method, I saved all the changes to txt file, and then copy-pasted all the text to User:KiranBOT/sandbox5. Except for saving individual entries to page vs saving individual edits to txt file (and some edit counting mechanism), there were no changes in the program at all. But the file program captured some images (undesired, original program skipped them), and the file program couldn't properly check the diff ID's already present in the file — that is resulting in multiple same entries being added (some of these repeated entries do not have timestamps). Both the programs are in this repository on github. —usernamekiran (talk) 14:43, 26 December 2023 (UTC) reply
[[
, and last occurrence of the closing bracket of the same line. If there are more than a certain number of characters, then the regex would remove the "RD". Implementing this regex in current program would bulk up/clutter the code, and it would also be redundant once the MOS/standardisation comes in. Implementing it later would cause only one edit per page. —usernamekiran
(talk) 19:07, 26 December 2023 (UTC)
replyApproved. Please run the version that prepares the edits in a file for the initial historical run. I don't entirely understand why your code produces duplicate entries, but perhaps you could write a script to clean up the duplicate lines and any other undesired things you think are easy to fix (like images) before saving. Maybe that approach would be easier than making changes to the main code? In any case, I'm satisfied this task is low-risk enough and any issues can be corrected after the fact. — The Earwig ( talk) 16:48, 28 December 2023 (UTC) reply
Operator: Usernamekiran ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 18:02, Thursday, October 19, 2023 ( UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): pywikibot
Source code available: github
Function overview: archive entries/blurbs from Template:In the news
Links to relevant discussions (where appropriate): requested at BOTREQ, and further discussion at Wikipedia talk:In the news.
Edit period(s): once per day
Estimated number of pages affected: archive bot, will create one page per month
Exclusion compliant (Yes/No): No
Already has a bot flag (Yes/No): Yes
Function details: The bot goes through the edit revisions/diffs of Template:In the news, and archives the entries that have been added.
*[[
or * [[
then the bot considers it as a recent death. If it begins with <!--
, it considers the entry as news/event.|
(if there is a white-space after the pipe), and [[Image:
In the first run, the bot will archive all the entries starting from March 2004. That would be around 40,000 edits. After that I will setup a daily cronjob to go through latest 100 edits. The current average is around 50 edits per day for the ITN template since creation, and 7 edits per day since last 8 years, but we should keep it 100 revisions for "just in case". I have already implemented a check in program based on revision ID/diff, so there would be no repeated archrivals.
Kindly feel free to ask any questions/doubts you have. Regards, —usernamekiran (talk) 18:02, 19 October 2023 (UTC) reply
PS: I will post links to sandbox/trial edits shortly. —usernamekiran
(talk) 18:09, 19 October 2023 (UTC)
reply
Approved for trial (50 edits or 30 days, whichever happens first). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac ( talk) 09:20, 24 October 2023 (UTC) reply
Trial complete. The bot created the archive page Wikipedia:In the news/Featured/February 2004 (for consistency with ITN candidates archive at Wikipedia:In the news/Candidates/April 2005). Everything worked as expected. Requesting an extended trial with 2,000 edits. —usernamekiran (talk) 13:49, 24 October 2023 (UTC) reply
BAG assistance needed The trial went as expected, and new functionality of updating index worked as expected
in sandbox. Is it possible to get this task approved? —usernamekiran
(talk) 13:09, 31 October 2023 (UTC)
reply
{{ BAG assistance needed}} based on the discussion at WT:ITN#Archive of ITN postings, no changes were made to the original program except for changing the target location for archive pages. requesting extended trial for 500 edits. —usernamekiran (talk) 02:40, 17 November 2023 (UTC) reply
{{Wikipedia:In the news/Featured/Archives/header}}
. The header for all the candidates archive page is same (
Wikipedia:In the news/Candidates/November 2023), but in our case, there is one extra line: "The relevant discussions for additions of entries to the Wikipedia talk:In the news, kindly see Wikipedia:In the news/Candidates/November 2023, or the previous month's page thereof." I am not sure how can we come up with something so that the header could be updated from single place/edit, as even if we use substitution somehow, it wouldnt be editable from single location in the future. Kindly let me know if you have any suggestions/ideas regarding that. (2) Regarding the edits in rapid succession: I was having a lot of difficulties because of the way mediawiki presents diffs in the html format. eg, even if a single word (eg death toll) is updated from 10 to 15, then the diff is presented as the original line completely removed, and an entire new line with the updated word as a totally new line. To circumvent this issue − and to avoid repeated entries, the bot relies on diff IDs already present in the archive page. Also, in current days, the ITN template gets edited/updated around 5 to 7 times in a day. So, in the first run, bot will archive all the entries (around 40,000), and then it will run everyday to archive the new 5 to 7 entries. In short: appending all the changes in single edit for a single run is not feasible (also difficult, as lot of entries do not go in same months, eg a blurb being added on 29 Nov, and being removed on 2 Dec). This can be resolved by adding a delay of 3 (or 5) seconds between each save operation. —usernamekiran (talk) 17:03, 17 November 2023 (UTC) reply
Trial complete. @ The Earwig: Hello. I ran the original program, and User:KiranBOT/sandbox/Posted/February 2004 was created. With another method, I saved all the changes to txt file, and then copy-pasted all the text to User:KiranBOT/sandbox5. Except for saving individual entries to page vs saving individual edits to txt file (and some edit counting mechanism), there were no changes in the program at all. But the file program captured some images (undesired, original program skipped them), and the file program couldn't properly check the diff ID's already present in the file — that is resulting in multiple same entries being added (some of these repeated entries do not have timestamps). Both the programs are in this repository on github. —usernamekiran (talk) 14:43, 26 December 2023 (UTC) reply
[[
, and last occurrence of the closing bracket of the same line. If there are more than a certain number of characters, then the regex would remove the "RD". Implementing this regex in current program would bulk up/clutter the code, and it would also be redundant once the MOS/standardisation comes in. Implementing it later would cause only one edit per page. —usernamekiran
(talk) 19:07, 26 December 2023 (UTC)
replyApproved. Please run the version that prepares the edits in a file for the initial historical run. I don't entirely understand why your code produces duplicate entries, but perhaps you could write a script to clean up the duplicate lines and any other undesired things you think are easy to fix (like images) before saving. Maybe that approach would be easier than making changes to the main code? In any case, I'm satisfied this task is low-risk enough and any issues can be corrected after the fact. — The Earwig ( talk) 16:48, 28 December 2023 (UTC) reply