Operator: Lkolbly ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 18:57, Monday, December 24, 2018 ( UTC)
Function overview: This bot automatically updates Alexa rankings in website infoboxes by querying the Alexa Web Information Service.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: https://github.com/lkolbly/alexawikibot (presently, the actual saving is commented out, for testing)
Links to relevant discussions (where appropriate): Previous bot that performed this task: OKBot_5
Edit period(s): Monthly or so
Estimated number of pages affected: 4,560 articles are in the current candidate list. A subset of these pages will be updated each month. Other pages could be pulled into the fray over time if someone adds alexa information to a page. Also, there will be a whitelist copied from User:OsamaK/AlexaBot.js of pages that will be edits (presently containing 1,412 pages).
Namespace(s): Articles
Exclusion compliant (Yes/No): Yes (via whatever functionality is already in pywikipedia)
Function details: This bot will scan all pages (using a database dump as a first pass) to find pages which have the "Infobox website" template with both "url" and "alexa" fields.
It will parse the domain from the url field using a few heuristics, and query the domain with AWIS. Domains that have subdomains return incorrect results from AWIS (e.g. mathmatica.wolfram.com returns the result for just wolfram.com), so these domains are discarded (and the page not touched). It will then perform an AWIS query to determine the current website rank and trend over 3 months.
Websites will be classified into {{Increase}}, {{Decrease}}, and {{steady}} (, , and , respectively). A site increasing in popularity will gain it the tag, even though it is numerically decreasing (previously, many sites were also classified into IncreaseNegative and DecreasePositive that I didn't understand)
Then, in the text of the article, whatever the current alexa data is will be replaced by something like:
{{Increase}} 169,386 ({{as of|2018|12|24}})<ref name="alexa">{{cite web|url= http://www.alexa.com/siteinfo/darwinawards.com | publisher= [[Alexa Internet]] |title=Darwinawards.com Traffic, Demographics and Competitors - Alexa |accessdate= 2018-12-24 }}</ref> <!-- Updated monthly by LkolblyBot -->
(e.g. 169,386 (As of 24 December 2018 [update]) [1] )
There are two as-yet untested test cases that I'll test (and fix if necessary) before any full-scale deployment:
|alexa=
parameters? I have to go find one and see what the bot does with it. (probably the right thing to do is to not touch the page at all in that situation)|alexa=
parameter, which should be fine, but worth testing anyway.References
Please make the bot's talk page.
"whatever the current alexa data is will be replaced" - how do you know there isn't more than just the previous value? Or that there isn't a reference that is used elsewhere?
I imagine many pages that copy-paste the template code will have an empty |alexa=
parameter. This would not be any different to not having it at all.
Do you preserve template's formatting?
The particular citation style the bot uses may not match the article's, especially the date format. (I wonder why we don't have an Alexa citation template still.) — HELLKNOWZ ▎ TALK 21:26, 24 December 2018 (UTC) reply
r"\|\s*alexa\s*=\s*{}".format(re.escape(current_alexa))
, so the rest of the template is unaffected. (the number of spaces before the equal sign goes from "any number" to "exactly one", though)|df=
parameters. —
HELLKNOWZ ▎
TALK 16:36, 25 December 2018 (UTC)
replyname="alexa"
or something in the page text. I think it's a fairly rare occurrence though.Operator: Lkolbly ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 18:57, Monday, December 24, 2018 ( UTC)
Function overview: This bot automatically updates Alexa rankings in website infoboxes by querying the Alexa Web Information Service.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: https://github.com/lkolbly/alexawikibot (presently, the actual saving is commented out, for testing)
Links to relevant discussions (where appropriate): Previous bot that performed this task: OKBot_5
Edit period(s): Monthly or so
Estimated number of pages affected: 4,560 articles are in the current candidate list. A subset of these pages will be updated each month. Other pages could be pulled into the fray over time if someone adds alexa information to a page. Also, there will be a whitelist copied from User:OsamaK/AlexaBot.js of pages that will be edits (presently containing 1,412 pages).
Namespace(s): Articles
Exclusion compliant (Yes/No): Yes (via whatever functionality is already in pywikipedia)
Function details: This bot will scan all pages (using a database dump as a first pass) to find pages which have the "Infobox website" template with both "url" and "alexa" fields.
It will parse the domain from the url field using a few heuristics, and query the domain with AWIS. Domains that have subdomains return incorrect results from AWIS (e.g. mathmatica.wolfram.com returns the result for just wolfram.com), so these domains are discarded (and the page not touched). It will then perform an AWIS query to determine the current website rank and trend over 3 months.
Websites will be classified into {{Increase}}, {{Decrease}}, and {{steady}} (, , and , respectively). A site increasing in popularity will gain it the tag, even though it is numerically decreasing (previously, many sites were also classified into IncreaseNegative and DecreasePositive that I didn't understand)
Then, in the text of the article, whatever the current alexa data is will be replaced by something like:
{{Increase}} 169,386 ({{as of|2018|12|24}})<ref name="alexa">{{cite web|url= http://www.alexa.com/siteinfo/darwinawards.com | publisher= [[Alexa Internet]] |title=Darwinawards.com Traffic, Demographics and Competitors - Alexa |accessdate= 2018-12-24 }}</ref> <!-- Updated monthly by LkolblyBot -->
(e.g. 169,386 (As of 24 December 2018 [update]) [1] )
There are two as-yet untested test cases that I'll test (and fix if necessary) before any full-scale deployment:
|alexa=
parameters? I have to go find one and see what the bot does with it. (probably the right thing to do is to not touch the page at all in that situation)|alexa=
parameter, which should be fine, but worth testing anyway.References
Please make the bot's talk page.
"whatever the current alexa data is will be replaced" - how do you know there isn't more than just the previous value? Or that there isn't a reference that is used elsewhere?
I imagine many pages that copy-paste the template code will have an empty |alexa=
parameter. This would not be any different to not having it at all.
Do you preserve template's formatting?
The particular citation style the bot uses may not match the article's, especially the date format. (I wonder why we don't have an Alexa citation template still.) — HELLKNOWZ ▎ TALK 21:26, 24 December 2018 (UTC) reply
r"\|\s*alexa\s*=\s*{}".format(re.escape(current_alexa))
, so the rest of the template is unaffected. (the number of spaces before the equal sign goes from "any number" to "exactly one", though)|df=
parameters. —
HELLKNOWZ ▎
TALK 16:36, 25 December 2018 (UTC)
replyname="alexa"
or something in the page text. I think it's a fairly rare occurrence though.