To editor Mr. Stradivarius: I'm trying to use this but when I bring up the tags interface, all the fields are greyed out and I can't add title or tags to new Signpost articles. Past articles already tagged are fine. Chris Troutman ( talk) 23:08, 6 February 2022 (UTC)
Hi @ Mr. Stradivarius! Thanks so much for creating this. Would it be possible to add authors (bylines) to the metadata? This could be used to generate profile pages of Signpost writers and their articles. Cheers! 🐶 EpicPupper (he/him | talk) 05:08, 6 June 2022 (UTC)
Apologies for the extremely slowpoke.jpg followup on this, but I think author tags would be a very good idea, and implementing them here would save me a large amount of work versus implementing them separately in an independent module. As one example, they would allow us to link authors' bylines to lists of their articles, as basically all modern news outlets do. and I am willing to assist in modifying the script / module (or assist with harmonization of input data on the Signpost pages themselves, automation to update old indices etc) if additional work is required. jp× g 15:53, 4 November 2022 (UTC)
I see article subheadings on Wikipedia:Wikipedia Signpost, but I don't see them on article pages or anywhere in the archives. Would it be acceptable to leave these out of the index modules? Adding these would also mean adding them to WP:SPT, and I would prefer to keep things simple if the subheadings are not used all that much. Also, I couldn't find any mention of departments in the Signpost articles I checked - do you have any examples of the department metadata that you mentioned?
Also, yes, this script, or a variant of it, should be run after each article is published (or we could probably just run it daily). It is not that much of a stretch from running a script on a user's computer to running a script every day automatically on Toolforge. Best — Mr. Stradivarius ♪ talk ♪ 00:12, 17 November 2022 (UTC)
Hi. I first heard of this module a while ago, but I didn't have the time to go into great detail with it -- now I am trying to do a comprehensive review of Signpost technical infrastructure, so I am here. First of all, I think it whips ass. This is great! I have a few ideas for how I could use it to accomplish a few new features (and probably some ideas for new features the module could have).
Second of all, I notice that something strange seems to be happening in Module:Signpost/index/2022 (and possibly elsewhere): a bunch of article titles have "subscribe subscribe" at their beginning for no apparent reason. If I have time I will try to go figure out what is causing this (probably some templates not playing well together) but I am not very familiar with Lua so it is unlikely I can fix it very well myself if it ends up being something in the module. jp× g 15:45, 4 November 2022 (UTC)
<h2>...</h2>
tags to text, but this included the subscribe link added by
mw:Extension:DiscussionTools. This link was presumably added around August. I chose to fix this by getting the title from a new "data-signpost-article-title" attribute
added to
Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2, and as a backup, getting it from the span inside the h2 tag with the class "mw-headline". I also went through and fixed all the instances where the "subscribe subscribe" links were added to the index modules. All signpost articles newly tagged since August had the "subscribe subscribe" text added, so it was not just limited to 2022 articles, although that's where the problem was most common. —
Mr. Stradivarius
♪ talk ♪
08:52, 11 November 2022 (UTC)@ Mr. Stradivarius: Today I succeeded in writing something that I have wanted for quite some time, viz. a way to look at Signpost viewership statistics that isn't bad and useless. Source code is at https://github.com/jp-x-g/wegweiser --what it does is very simple. It finds and records view counts for Signpost articles after publication (for a standardized interval afterwards, for purposes of comparison). Anyway, the reason this involves this module is as such:
Storing this data necessitates the creation of some large index of all Signpost articles, and rather than reinvent the wheel, I reckon it would be useful to do so in this module's indices, and I've found a way to make my script parse and update the Lua tables properly. I tested it briefly on Module:Signpost/index/2022 (diff here of what it looks like with the extra fields). I'm not very hot with Lua, so I don't know what this does on the backend utilities that use this module, but SPT works fine with these extra fields, as does Wikipedia talk:Wikipedia Signpost/Single/2022-01-30 (which uses Wikipedia:Wikipedia Signpost/Templates/Single talk, which uses Wikipedia:Wikipedia Signpost/Templates/Article list maker, which uses Module:Signpost).
Anyway, I have everything working, and I am ready to add the fields to all the indices (only back to 2015 since per-page view counts aren't available before then), but I wanted to hold off and make sure that this isn't going to break everything first. What do you say? jp× g 08:19, 5 January 2023 (UTC)
views30
, views60
etc., I would prefer that the page view statistics are put into their own subtable, like views = {[7 = 642, 30 = 1966, 60 = 2279, 90 = 2419}
. The data would be more structured this way. Best —
Mr. Stradivarius
♪ talk ♪
06:51, 6 January 2023 (UTC)
I will have a look at WP:SPT to see how difficult it would be to pass through arbitrary Lua data tables. — Mr. Stradivarius ♪ talk ♪ 12:19, 8 January 2023 (UTC)
mw.text.jsonEncode({[7 = 642, 30 = 1966, 60 = 2279, 90 = 2419})
-- '{"7":642,"30":1966,"60":2279,"90":2419}'
views = {d7 = 642, d30 = 1966, d60 = 2279, d90 = 2419},
@ JPxG: I made a pull request to Wegweiser to use the same Lua table format as SignpostTagger. Does that look like an acceptable way of making the diffs cleaner in the index modules?
As for the question of why page view data is necessary, I understand that you want to display the page view data in wikitext, and that using the page view API for this directly would be impractical. The thing I'm not understanding is why you want to add page view data to wikitext in the first place. Are you planning to add a page view counter on article pages? Or are you planning to use the data in some other way? Best — Mr. Stradivarius ♪ talk ♪ 05:02, 22 January 2023 (UTC)
The current version of Wegweiser, while not perfect, now has the ability to pull article lists from the PrefixIndex API and generate skeleton entries (no tags, but date and subpage) in the indices. I filled them out from 2005 to present, which added some several hundred articles previously unindexed (i.e. 2017 only had a couple articles in the index for some reason). jp× g 01:33, 7 January 2023 (UTC)
Boom goes the dynamite. It works for everything after 2017-02, which was when Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2 came into use. Before that, things get a little hazy, but here is my chronology of headline and byline styles:
==AMA begins plans for new election==
:<small>By [[User:Michael Snow|Michael Snow]], [[10 January]] [[2005]]</small>
<h2 style="margin-right:60px;">Reporter who plagiarized Wikipedia gets dismissed</h2>
<small>:By [[User:Michael Snow|Michael Snow]], [[16 January]] [[2006]]</small>
{{Wikipedia:Signpost/Template:Signpost-article-start|Let's get serious about plagiarism|By [[User:Awadewit|Awadewit]], [[User:Elcobbola|Elcobbola]], [[User:Jbmurray|Jbmurray]], [[User:Kablammo|Kablammo]], [[User:Moonriddengirl|Moonriddengirl]] and [[User:Tony1|Tony1]] |13 April 2009}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2|{{{1|Wikipedia has cancer}}}|By [[User:Guy Macon|Guy Macon]]| 9 February 2017}}
Code written to handle titles and author fields from 2009 to 2017, running scripts now. jp× g 01:26, 9 January 2023 (UTC)
The whole batch from January to May 2005 was small enough that it didn't warrant writing extra code, so I was going to do an AWB run to normalize it to the 2005-2009 style, but I figured as long as I was firing up JWB I might as well bring it all the way into the future, so those hundred-and-change now use the modern style (with Signpost-article-header-v2). jp× g 01:59, 9 January 2023 (UTC)
I am currently working through a massive JWB run to amend all back issues with the modern header template that formats titles and author metadata in a sensible fashion, among other things (and while I'm at it, I am updating User:JPxG/The Illuminated Signpost). Anyway, everything through mid-2006 is done now.
To find weird edge-cases and messed up pages, I have written a new script, validator.py, which outputs to the following page:
This displays all of the entries in year indices that are missing fields -- a summary table is at the top, and then it lists all the individual articles with errors. You can ignore the gigantic numbers for 2006, 2007, 2008 and 2009 (I am still reformatting them to have parseable titles and authors): from 2010 to present the missing fields are actual errors. Most of them are just articles without tags, for which some help would be appreciated. I have noticed some strange behavior, though. @ Mr. Stradivarius: Is there a reason that the SignpostTagger isn't showing a "manage tags" box for old articles, like Wikipedia:Wikipedia Signpost/2005-12-12/Welcome RSS readers? I looked through the .js and I didn't see anything that looked like it was excluding articles by year. jp× g 22:36, 11 January 2023 (UTC)
@ Mr. Stradivarius: I have a working modification for the module, currently located at Module:Sandbox/JPxG, which is capable of using the author/pageview metadata in list/table generation. I don't want to just slap it into the main module without any notice, so I am letting you know here. This version (which you can see the test cases for at the doc page) allows for returning the author, as well as viewsSeven, viewsFifteen [...] up to viewsOneEighty. Below I'll embed a use case, which is a table of the view counts for yesterday's issue: User:JPxG/sandboxbollocks {{User talk:JPxG/sandbox99 | rowformat = {{Wikipedia:Wikipedia Signpost/Templates/Article list maker/Pageviews|date=${date}|subpage=${subpage}|title=${title}|viewsseven=${viewsSeven}}} | sortdir = descending | startdate = 2023-01-15 | enddate = 2023-12-01 }} |} Anyway, I don't see this conflicting with any other use of the module, and it hasn't broken in the period of me testing it, so I will add these changes to the main module, unless you have an objection or want to do it better (I don't think this is the best-written code, as I am not a "Lua guy"). jp× g 05:05, 17 January 2023 (UTC)
views7
and getViews7
instead of viewsSeven
and getViewsSeven
. I find it easier to tell at a glance what the number is when using digits. Also, once it gets over 100, things start to get less obvious. Should it be getViewsOneTwenty
, getViewsOneHundredTwenty
or getViewsOneHundredAndTwenty
? Users of the module will probably have to look at the documentation to get it right. Otherwise, the code looks good to me. —
Mr. Stradivarius
♪ talk ♪
14:32, 18 January 2023 (UTC)
viewsSeven
to views7
, then edit the table maker to call views7
... it will flip out. I don't know if this is an insurmountable issue with Lua, or a skill issue on my part (likely the latter). If there's any way to make it just be "views7", that wold be infinitely preferable lol.
jp×
g
23:29, 18 January 2023 (UTC)
${title}
to accept both letters and numbers - previously it only accepted letters. Best —
Mr. Stradivarius
♪ talk ♪
15:06, 19 January 2023 (UTC)@ Mr. Stradivarius: I accepted your pull request, but running the program, the whitespace seems to still be quite different -- is this a bug, or am I running it incorrectly? jp× g 04:31, 25 January 2023 (UTC)
@ JPxG: I undid WegweiserBot's edits to the index modules with my new serialisation code, due to this issue. The fix was also pretty simple, so I've made a new pull request. — Mr. Stradivarius ♪ talk ♪ 13:49, 31 January 2023 (UTC)
\uxxxx
escapes as well. It turns out that Lua doesn't actually have that kind of escape, as it treats strings as a series of bytes, rather than a series of Unicode characters. So it's not really a bug in
WP:SPT, but rather a fundamental issue of how Lua was trying to parse those strings. In other words, my serialisation code in Wegweiser was outputting incorrect Lua strings, so that's the thing we need to fix. —
Mr. Stradivarius
♪ talk ♪
14:14, 31 January 2023 (UTC)
@
JPxG: Regarding
this edit - how about adding |authortemplate=
, |authorformat=
and |authorseparator=
parameters to specify how to format each author in the output? That way you can do things like link each user's userpage, etc., instead of just listing the usernames separated by commas. Also, maybe we could do the same for tags with |tagtemplate=
, |tagformat=
and |tagseparator=
parameters. —
Mr. Stradivarius
♪ talk ♪
04:03, 29 January 2023 (UTC)
@
JPxG: About
Special:Diff/1139235436: it looks like
WegweiserBot is using display names for author names, but
WP:SPT is using usernames. This is going to cause the two scripts to keep overriding each other, so we should choose one approach. I would prefer to include only username, as that makes it easier to do things like make user links. I can see that there would be a case for listing display names instead, though. Or if we really want, we could include both, in tables like {user = "Bluerasberry", display = "Lane Rasberry"}
. What do you think would be best? —
Mr. Stradivarius
♪ talk ♪
04:28, 14 February 2023 (UTC)
[User:GerardM
, {{{2}}}
, §hep
, \/
, +sj +
, 2 May
, 3 July 2006
, 03 July 2006
, 3family6
, 3family6 1 April 2016 19:58 (UTC)
, 05 November 2007
, 10 other editors
, 11 other editors
, 12 August 2015
, 14 April
, 18 August
, 19 November 2012
, 22mikpau
, 24 April
, 24 Apri
, 26 April 2010
, 27 other editors,
, 28 other editors
, 32 other editors
, 38 editors
, 51 other editors
, 53 other editors
, 79 other editors
, 91 editors
, 106 editors on the French Wikipedia; translated for The Signpost by JohnNewton8
, 273 other editors
, 1233
, 2008
, 16912 Rhiannon 15 July 2015
(literally all of which are parsing errors except for three); your articles, specifically, were under HaeB
, Tilman Bayer
, Tilman Bayer
, Tilman Bayer 1 April 2016 19:58 (UTC)
, Tilman Bayer 03:08 (UTC)
(two of those appearing to be identical strings, but one of them with a nonprinting character). I guess what I am trying to say here is that if you want me to de-alias your name changes, I can do that, but I think that given the volume of work being done it was not something I could realistically post about at
WT:Signpost (I recall that around December/January you had been saying we should be stricter about posting stuff on the right talk pages, which is why it was at /Technical).
jp×
g
23:14, 6 August 2023 (UTC)
@ Mr. Stradivarius: Per this, it's now possible to have Lua modules load JSON rather than Lua tables. I think this would be a lot better to work with (i.e. all utilities wouldn't have to constantly serialize and deserialize Lua tables using the idiosyncratic whitespace/indentation/etc format). Additionally, a page with the JSON content model would be constrained and sanitized by MediaWiki, rather than a Lua table which can just have wrong stuff in it etc. I would like to write some more utilities to work with these indices but the Lua tables are kind of an awkward sticking point. What would the procedure be for converting them? I would be able to write patches for the Signpost tagger and Wegweiser (and may be able to help with the Lua module itself). jp× g 🗯️ 23:52, 8 December 2023 (UTC)
Lately it has occurred to me that the way old Signpost archives get generated is very ass-backwards. We have a giant database of every article, its title, its author, et cetera... we're just not using it. It does get used sometimes, like in Wikipedia:Wikipedia_Signpost/Templates/Single_talk. But for the archive issues, we have hundreds of individual pages, like Wikipedia:Wikipedia Signpost/Single/2022-11-28, that redundantly store article titles, subheadings, et cetera. The modifications I'm working on now (which I've incorporated into the snippet template and the publishing script) allow articles to be associated with custom images, so the archives will have that too, but then this creates a problem: modifying an image for an article requires me to edit the article, then update Wikipedia:Wikipedia Signpost, then update Wikipedia:Wikipedia Signpost/2023-12-04, and such.
Anyway: I'd like to, as much as possible, use the module for stuff instead of static wikitext pages. However, this will again require some more fields to be added. Right now, my thinking is:
subhed
(subheading, or blurb, or whatever) -- string, nowadays this is a sentence or so but in some old issues it's GIGANTIC (paragraph or more). Has various random crap in it (templates, quotation marks, etc).piccy
-- string, image file for the article. These, ideally, would be a 1:1 aspect ratio, but might sometimes not be, which brings me to:piccy-meta
-- coordinates for CSS crop of the image. I am not 100% on this. It might just be four CSS crop coordinates. But I am also contemplating other attributes, like filters (what if we want to desaturate an image, etc).Like above, I am fine to incorporate these into the publishing script, Wegweiser, and the tagging script, but may need some assistance with the Lua part (and I don't know if this should be done before or after the lua table-json thing). @ Mr. Stradivarius: what do you think? jp× g 🗯️ 00:04, 9 December 2023 (UTC)
subheading
and image
should be fine. I would also make "image" a table, so you could do something like image = {filename = "Example.png", width = 100, height = 100}
, where "width" and "height" (or whatever metadata you need) are optional. I'm not aware of any way to crop an image in Mediawiki using inline styles - I think we are limited to the options provided at
mw:Help:Images/en, but let me know if I'm missing something. Best —
Mr. Stradivarius
♪ talk ♪
09:03, 9 December 2023 (UTC)
{{Signpost/snippet|2023-12-04|Essay|I am going to die|And so are you.|0.3 MB|sub=0.3 MB|by=[[User:WhatamIdoing|WhatamIdoing]]|pic=File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG|credit-name=William Michael Harnett|credit-license=PD|pic-p=800|pic-x=350|pic-y=100}}
scale
, x
and y
. All are integers, although I think it might also be useful to permit values like top
, bottom
, left
, right
and center
(these aren't handled by the template yet but they could be in the future). There are also two more params I didn't think of above, author
and license
, which are necessary for image attribution.
jp×
g
🗯️
03:59, 11 December 2023 (UTC)Need to be supported by various scripts in order to work properly.
I have rewritten Wegweiser to fetch metadata from parsing article wikitext instead of HTML pages; this posed some slight difficulties with respect to user tags in author fields but is now good. It can now work a lot faster, and it also provides subheadline metadata. There's a line I have commented out right now in the script, but can enable to make it store subheading data when it parses metadata. When the subheading data is in the module indices, the module still works fine to retrieve articles etc (of course it can't do anything to parse or use the subheading yet, but it doesn't break anything). However, the Signpost tagger chokes trying to save tags for articles with subheading data and won't work on them, so I won't do all of the indices with it for now. jp× g 🗯️ 22:06, 15 December 2023 (UTC)
So right now I've integrated into Wegweiser and SignpostTagger the fields for piccy information, like this:
{ date = "2023-12-04", subpage = "Essay", title = "I am going to die", authors = {"WhatamIdoing"}, tags = {"essay"}, views = {d007 = 1526, d015 = 1919, d030 = 2029, d060 = 2029, d090 = 2029, d120 = 2029, d180 = 2029}, piccycredits = "William Michael Harnett", piccyfilename = "File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG", piccylicense = "PD", piccyscaling = "400", piccyxoffset = "70", piccyyoffset = "", subhead = "And so are you.", },
496 chars. This seems, uh, stupid. It works but I have temporarily reverted. Since these six fields are all about the same thing, there's no good reason to have them occupy six whole fields -- they should probably just be a dict like the viewcounts are. Like such:
{ date = "2023-12-04", subpage = "Essay", title = "I am going to die", authors = {"WhatamIdoing"}, tags = {"essay"}, views = {d007 = 1526, d015 = 1919, d030 = 2029, d060 = 2029, d090 = 2029, d120 = 2029, d180 = 2029}, piccy = {filename = "File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG", credits = "William Michael Harnett", license = "PD", scaling = "400", xoffset = "70", yoffset = ""} subhead = "And so are you.", },
467. But, upon thinking this thought, something rather devious popped into my mind: aren't these labels kind of long? It seems pretty insubstantial, but... there are a lot of articles. Six letters for each key, across 5561 (currently 5462) articles (plus 3 extra chars for the =
necessitated by using a label at all) is 50049 bytes (currently 49158). For reference, the size of all extant module indices is 2377559 (currently 1406253). So that's, uh, 3.496% of the total index size being taken up just by key names. These have to be parsed by, basically, everything, and it's not quite clear that repeating the field names all these thousands of times is more efficient than just having them as an array whose ordering is documented. Perhaps it would be better to do this? @
Mr. Stradivarius:
jp×
g
🗯️
13:51, 23 December 2023 (UTC)
To editor Mr. Stradivarius: I'm trying to use this but when I bring up the tags interface, all the fields are greyed out and I can't add title or tags to new Signpost articles. Past articles already tagged are fine. Chris Troutman ( talk) 23:08, 6 February 2022 (UTC)
Hi @ Mr. Stradivarius! Thanks so much for creating this. Would it be possible to add authors (bylines) to the metadata? This could be used to generate profile pages of Signpost writers and their articles. Cheers! 🐶 EpicPupper (he/him | talk) 05:08, 6 June 2022 (UTC)
Apologies for the extremely slowpoke.jpg followup on this, but I think author tags would be a very good idea, and implementing them here would save me a large amount of work versus implementing them separately in an independent module. As one example, they would allow us to link authors' bylines to lists of their articles, as basically all modern news outlets do. and I am willing to assist in modifying the script / module (or assist with harmonization of input data on the Signpost pages themselves, automation to update old indices etc) if additional work is required. jp× g 15:53, 4 November 2022 (UTC)
I see article subheadings on Wikipedia:Wikipedia Signpost, but I don't see them on article pages or anywhere in the archives. Would it be acceptable to leave these out of the index modules? Adding these would also mean adding them to WP:SPT, and I would prefer to keep things simple if the subheadings are not used all that much. Also, I couldn't find any mention of departments in the Signpost articles I checked - do you have any examples of the department metadata that you mentioned?
Also, yes, this script, or a variant of it, should be run after each article is published (or we could probably just run it daily). It is not that much of a stretch from running a script on a user's computer to running a script every day automatically on Toolforge. Best — Mr. Stradivarius ♪ talk ♪ 00:12, 17 November 2022 (UTC)
Hi. I first heard of this module a while ago, but I didn't have the time to go into great detail with it -- now I am trying to do a comprehensive review of Signpost technical infrastructure, so I am here. First of all, I think it whips ass. This is great! I have a few ideas for how I could use it to accomplish a few new features (and probably some ideas for new features the module could have).
Second of all, I notice that something strange seems to be happening in Module:Signpost/index/2022 (and possibly elsewhere): a bunch of article titles have "subscribe subscribe" at their beginning for no apparent reason. If I have time I will try to go figure out what is causing this (probably some templates not playing well together) but I am not very familiar with Lua so it is unlikely I can fix it very well myself if it ends up being something in the module. jp× g 15:45, 4 November 2022 (UTC)
<h2>...</h2>
tags to text, but this included the subscribe link added by
mw:Extension:DiscussionTools. This link was presumably added around August. I chose to fix this by getting the title from a new "data-signpost-article-title" attribute
added to
Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2, and as a backup, getting it from the span inside the h2 tag with the class "mw-headline". I also went through and fixed all the instances where the "subscribe subscribe" links were added to the index modules. All signpost articles newly tagged since August had the "subscribe subscribe" text added, so it was not just limited to 2022 articles, although that's where the problem was most common. —
Mr. Stradivarius
♪ talk ♪
08:52, 11 November 2022 (UTC)@ Mr. Stradivarius: Today I succeeded in writing something that I have wanted for quite some time, viz. a way to look at Signpost viewership statistics that isn't bad and useless. Source code is at https://github.com/jp-x-g/wegweiser --what it does is very simple. It finds and records view counts for Signpost articles after publication (for a standardized interval afterwards, for purposes of comparison). Anyway, the reason this involves this module is as such:
Storing this data necessitates the creation of some large index of all Signpost articles, and rather than reinvent the wheel, I reckon it would be useful to do so in this module's indices, and I've found a way to make my script parse and update the Lua tables properly. I tested it briefly on Module:Signpost/index/2022 (diff here of what it looks like with the extra fields). I'm not very hot with Lua, so I don't know what this does on the backend utilities that use this module, but SPT works fine with these extra fields, as does Wikipedia talk:Wikipedia Signpost/Single/2022-01-30 (which uses Wikipedia:Wikipedia Signpost/Templates/Single talk, which uses Wikipedia:Wikipedia Signpost/Templates/Article list maker, which uses Module:Signpost).
Anyway, I have everything working, and I am ready to add the fields to all the indices (only back to 2015 since per-page view counts aren't available before then), but I wanted to hold off and make sure that this isn't going to break everything first. What do you say? jp× g 08:19, 5 January 2023 (UTC)
views30
, views60
etc., I would prefer that the page view statistics are put into their own subtable, like views = {[7 = 642, 30 = 1966, 60 = 2279, 90 = 2419}
. The data would be more structured this way. Best —
Mr. Stradivarius
♪ talk ♪
06:51, 6 January 2023 (UTC)
I will have a look at WP:SPT to see how difficult it would be to pass through arbitrary Lua data tables. — Mr. Stradivarius ♪ talk ♪ 12:19, 8 January 2023 (UTC)
mw.text.jsonEncode({[7 = 642, 30 = 1966, 60 = 2279, 90 = 2419})
-- '{"7":642,"30":1966,"60":2279,"90":2419}'
views = {d7 = 642, d30 = 1966, d60 = 2279, d90 = 2419},
@ JPxG: I made a pull request to Wegweiser to use the same Lua table format as SignpostTagger. Does that look like an acceptable way of making the diffs cleaner in the index modules?
As for the question of why page view data is necessary, I understand that you want to display the page view data in wikitext, and that using the page view API for this directly would be impractical. The thing I'm not understanding is why you want to add page view data to wikitext in the first place. Are you planning to add a page view counter on article pages? Or are you planning to use the data in some other way? Best — Mr. Stradivarius ♪ talk ♪ 05:02, 22 January 2023 (UTC)
The current version of Wegweiser, while not perfect, now has the ability to pull article lists from the PrefixIndex API and generate skeleton entries (no tags, but date and subpage) in the indices. I filled them out from 2005 to present, which added some several hundred articles previously unindexed (i.e. 2017 only had a couple articles in the index for some reason). jp× g 01:33, 7 January 2023 (UTC)
Boom goes the dynamite. It works for everything after 2017-02, which was when Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2 came into use. Before that, things get a little hazy, but here is my chronology of headline and byline styles:
==AMA begins plans for new election==
:<small>By [[User:Michael Snow|Michael Snow]], [[10 January]] [[2005]]</small>
<h2 style="margin-right:60px;">Reporter who plagiarized Wikipedia gets dismissed</h2>
<small>:By [[User:Michael Snow|Michael Snow]], [[16 January]] [[2006]]</small>
{{Wikipedia:Signpost/Template:Signpost-article-start|Let's get serious about plagiarism|By [[User:Awadewit|Awadewit]], [[User:Elcobbola|Elcobbola]], [[User:Jbmurray|Jbmurray]], [[User:Kablammo|Kablammo]], [[User:Moonriddengirl|Moonriddengirl]] and [[User:Tony1|Tony1]] |13 April 2009}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2|{{{1|Wikipedia has cancer}}}|By [[User:Guy Macon|Guy Macon]]| 9 February 2017}}
Code written to handle titles and author fields from 2009 to 2017, running scripts now. jp× g 01:26, 9 January 2023 (UTC)
The whole batch from January to May 2005 was small enough that it didn't warrant writing extra code, so I was going to do an AWB run to normalize it to the 2005-2009 style, but I figured as long as I was firing up JWB I might as well bring it all the way into the future, so those hundred-and-change now use the modern style (with Signpost-article-header-v2). jp× g 01:59, 9 January 2023 (UTC)
I am currently working through a massive JWB run to amend all back issues with the modern header template that formats titles and author metadata in a sensible fashion, among other things (and while I'm at it, I am updating User:JPxG/The Illuminated Signpost). Anyway, everything through mid-2006 is done now.
To find weird edge-cases and messed up pages, I have written a new script, validator.py, which outputs to the following page:
This displays all of the entries in year indices that are missing fields -- a summary table is at the top, and then it lists all the individual articles with errors. You can ignore the gigantic numbers for 2006, 2007, 2008 and 2009 (I am still reformatting them to have parseable titles and authors): from 2010 to present the missing fields are actual errors. Most of them are just articles without tags, for which some help would be appreciated. I have noticed some strange behavior, though. @ Mr. Stradivarius: Is there a reason that the SignpostTagger isn't showing a "manage tags" box for old articles, like Wikipedia:Wikipedia Signpost/2005-12-12/Welcome RSS readers? I looked through the .js and I didn't see anything that looked like it was excluding articles by year. jp× g 22:36, 11 January 2023 (UTC)
@ Mr. Stradivarius: I have a working modification for the module, currently located at Module:Sandbox/JPxG, which is capable of using the author/pageview metadata in list/table generation. I don't want to just slap it into the main module without any notice, so I am letting you know here. This version (which you can see the test cases for at the doc page) allows for returning the author, as well as viewsSeven, viewsFifteen [...] up to viewsOneEighty. Below I'll embed a use case, which is a table of the view counts for yesterday's issue: User:JPxG/sandboxbollocks {{User talk:JPxG/sandbox99 | rowformat = {{Wikipedia:Wikipedia Signpost/Templates/Article list maker/Pageviews|date=${date}|subpage=${subpage}|title=${title}|viewsseven=${viewsSeven}}} | sortdir = descending | startdate = 2023-01-15 | enddate = 2023-12-01 }} |} Anyway, I don't see this conflicting with any other use of the module, and it hasn't broken in the period of me testing it, so I will add these changes to the main module, unless you have an objection or want to do it better (I don't think this is the best-written code, as I am not a "Lua guy"). jp× g 05:05, 17 January 2023 (UTC)
views7
and getViews7
instead of viewsSeven
and getViewsSeven
. I find it easier to tell at a glance what the number is when using digits. Also, once it gets over 100, things start to get less obvious. Should it be getViewsOneTwenty
, getViewsOneHundredTwenty
or getViewsOneHundredAndTwenty
? Users of the module will probably have to look at the documentation to get it right. Otherwise, the code looks good to me. —
Mr. Stradivarius
♪ talk ♪
14:32, 18 January 2023 (UTC)
viewsSeven
to views7
, then edit the table maker to call views7
... it will flip out. I don't know if this is an insurmountable issue with Lua, or a skill issue on my part (likely the latter). If there's any way to make it just be "views7", that wold be infinitely preferable lol.
jp×
g
23:29, 18 January 2023 (UTC)
${title}
to accept both letters and numbers - previously it only accepted letters. Best —
Mr. Stradivarius
♪ talk ♪
15:06, 19 January 2023 (UTC)@ Mr. Stradivarius: I accepted your pull request, but running the program, the whitespace seems to still be quite different -- is this a bug, or am I running it incorrectly? jp× g 04:31, 25 January 2023 (UTC)
@ JPxG: I undid WegweiserBot's edits to the index modules with my new serialisation code, due to this issue. The fix was also pretty simple, so I've made a new pull request. — Mr. Stradivarius ♪ talk ♪ 13:49, 31 January 2023 (UTC)
\uxxxx
escapes as well. It turns out that Lua doesn't actually have that kind of escape, as it treats strings as a series of bytes, rather than a series of Unicode characters. So it's not really a bug in
WP:SPT, but rather a fundamental issue of how Lua was trying to parse those strings. In other words, my serialisation code in Wegweiser was outputting incorrect Lua strings, so that's the thing we need to fix. —
Mr. Stradivarius
♪ talk ♪
14:14, 31 January 2023 (UTC)
@
JPxG: Regarding
this edit - how about adding |authortemplate=
, |authorformat=
and |authorseparator=
parameters to specify how to format each author in the output? That way you can do things like link each user's userpage, etc., instead of just listing the usernames separated by commas. Also, maybe we could do the same for tags with |tagtemplate=
, |tagformat=
and |tagseparator=
parameters. —
Mr. Stradivarius
♪ talk ♪
04:03, 29 January 2023 (UTC)
@
JPxG: About
Special:Diff/1139235436: it looks like
WegweiserBot is using display names for author names, but
WP:SPT is using usernames. This is going to cause the two scripts to keep overriding each other, so we should choose one approach. I would prefer to include only username, as that makes it easier to do things like make user links. I can see that there would be a case for listing display names instead, though. Or if we really want, we could include both, in tables like {user = "Bluerasberry", display = "Lane Rasberry"}
. What do you think would be best? —
Mr. Stradivarius
♪ talk ♪
04:28, 14 February 2023 (UTC)
[User:GerardM
, {{{2}}}
, §hep
, \/
, +sj +
, 2 May
, 3 July 2006
, 03 July 2006
, 3family6
, 3family6 1 April 2016 19:58 (UTC)
, 05 November 2007
, 10 other editors
, 11 other editors
, 12 August 2015
, 14 April
, 18 August
, 19 November 2012
, 22mikpau
, 24 April
, 24 Apri
, 26 April 2010
, 27 other editors,
, 28 other editors
, 32 other editors
, 38 editors
, 51 other editors
, 53 other editors
, 79 other editors
, 91 editors
, 106 editors on the French Wikipedia; translated for The Signpost by JohnNewton8
, 273 other editors
, 1233
, 2008
, 16912 Rhiannon 15 July 2015
(literally all of which are parsing errors except for three); your articles, specifically, were under HaeB
, Tilman Bayer
, Tilman Bayer
, Tilman Bayer 1 April 2016 19:58 (UTC)
, Tilman Bayer 03:08 (UTC)
(two of those appearing to be identical strings, but one of them with a nonprinting character). I guess what I am trying to say here is that if you want me to de-alias your name changes, I can do that, but I think that given the volume of work being done it was not something I could realistically post about at
WT:Signpost (I recall that around December/January you had been saying we should be stricter about posting stuff on the right talk pages, which is why it was at /Technical).
jp×
g
23:14, 6 August 2023 (UTC)
@ Mr. Stradivarius: Per this, it's now possible to have Lua modules load JSON rather than Lua tables. I think this would be a lot better to work with (i.e. all utilities wouldn't have to constantly serialize and deserialize Lua tables using the idiosyncratic whitespace/indentation/etc format). Additionally, a page with the JSON content model would be constrained and sanitized by MediaWiki, rather than a Lua table which can just have wrong stuff in it etc. I would like to write some more utilities to work with these indices but the Lua tables are kind of an awkward sticking point. What would the procedure be for converting them? I would be able to write patches for the Signpost tagger and Wegweiser (and may be able to help with the Lua module itself). jp× g 🗯️ 23:52, 8 December 2023 (UTC)
Lately it has occurred to me that the way old Signpost archives get generated is very ass-backwards. We have a giant database of every article, its title, its author, et cetera... we're just not using it. It does get used sometimes, like in Wikipedia:Wikipedia_Signpost/Templates/Single_talk. But for the archive issues, we have hundreds of individual pages, like Wikipedia:Wikipedia Signpost/Single/2022-11-28, that redundantly store article titles, subheadings, et cetera. The modifications I'm working on now (which I've incorporated into the snippet template and the publishing script) allow articles to be associated with custom images, so the archives will have that too, but then this creates a problem: modifying an image for an article requires me to edit the article, then update Wikipedia:Wikipedia Signpost, then update Wikipedia:Wikipedia Signpost/2023-12-04, and such.
Anyway: I'd like to, as much as possible, use the module for stuff instead of static wikitext pages. However, this will again require some more fields to be added. Right now, my thinking is:
subhed
(subheading, or blurb, or whatever) -- string, nowadays this is a sentence or so but in some old issues it's GIGANTIC (paragraph or more). Has various random crap in it (templates, quotation marks, etc).piccy
-- string, image file for the article. These, ideally, would be a 1:1 aspect ratio, but might sometimes not be, which brings me to:piccy-meta
-- coordinates for CSS crop of the image. I am not 100% on this. It might just be four CSS crop coordinates. But I am also contemplating other attributes, like filters (what if we want to desaturate an image, etc).Like above, I am fine to incorporate these into the publishing script, Wegweiser, and the tagging script, but may need some assistance with the Lua part (and I don't know if this should be done before or after the lua table-json thing). @ Mr. Stradivarius: what do you think? jp× g 🗯️ 00:04, 9 December 2023 (UTC)
subheading
and image
should be fine. I would also make "image" a table, so you could do something like image = {filename = "Example.png", width = 100, height = 100}
, where "width" and "height" (or whatever metadata you need) are optional. I'm not aware of any way to crop an image in Mediawiki using inline styles - I think we are limited to the options provided at
mw:Help:Images/en, but let me know if I'm missing something. Best —
Mr. Stradivarius
♪ talk ♪
09:03, 9 December 2023 (UTC)
{{Signpost/snippet|2023-12-04|Essay|I am going to die|And so are you.|0.3 MB|sub=0.3 MB|by=[[User:WhatamIdoing|WhatamIdoing]]|pic=File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG|credit-name=William Michael Harnett|credit-license=PD|pic-p=800|pic-x=350|pic-y=100}}
scale
, x
and y
. All are integers, although I think it might also be useful to permit values like top
, bottom
, left
, right
and center
(these aren't handled by the template yet but they could be in the future). There are also two more params I didn't think of above, author
and license
, which are necessary for image attribution.
jp×
g
🗯️
03:59, 11 December 2023 (UTC)Need to be supported by various scripts in order to work properly.
I have rewritten Wegweiser to fetch metadata from parsing article wikitext instead of HTML pages; this posed some slight difficulties with respect to user tags in author fields but is now good. It can now work a lot faster, and it also provides subheadline metadata. There's a line I have commented out right now in the script, but can enable to make it store subheading data when it parses metadata. When the subheading data is in the module indices, the module still works fine to retrieve articles etc (of course it can't do anything to parse or use the subheading yet, but it doesn't break anything). However, the Signpost tagger chokes trying to save tags for articles with subheading data and won't work on them, so I won't do all of the indices with it for now. jp× g 🗯️ 22:06, 15 December 2023 (UTC)
So right now I've integrated into Wegweiser and SignpostTagger the fields for piccy information, like this:
{ date = "2023-12-04", subpage = "Essay", title = "I am going to die", authors = {"WhatamIdoing"}, tags = {"essay"}, views = {d007 = 1526, d015 = 1919, d030 = 2029, d060 = 2029, d090 = 2029, d120 = 2029, d180 = 2029}, piccycredits = "William Michael Harnett", piccyfilename = "File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG", piccylicense = "PD", piccyscaling = "400", piccyxoffset = "70", piccyyoffset = "", subhead = "And so are you.", },
496 chars. This seems, uh, stupid. It works but I have temporarily reverted. Since these six fields are all about the same thing, there's no good reason to have them occupy six whole fields -- they should probably just be a dict like the viewcounts are. Like such:
{ date = "2023-12-04", subpage = "Essay", title = "I am going to die", authors = {"WhatamIdoing"}, tags = {"essay"}, views = {d007 = 1526, d015 = 1919, d030 = 2029, d060 = 2029, d090 = 2029, d120 = 2029, d180 = 2029}, piccy = {filename = "File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG", credits = "William Michael Harnett", license = "PD", scaling = "400", xoffset = "70", yoffset = ""} subhead = "And so are you.", },
467. But, upon thinking this thought, something rather devious popped into my mind: aren't these labels kind of long? It seems pretty insubstantial, but... there are a lot of articles. Six letters for each key, across 5561 (currently 5462) articles (plus 3 extra chars for the =
necessitated by using a label at all) is 50049 bytes (currently 49158). For reference, the size of all extant module indices is 2377559 (currently 1406253). So that's, uh, 3.496% of the total index size being taken up just by key names. These have to be parsed by, basically, everything, and it's not quite clear that repeating the field names all these thousands of times is more efficient than just having them as an array whose ordering is documented. Perhaps it would be better to do this? @
Mr. Stradivarius:
jp×
g
🗯️
13:51, 23 December 2023 (UTC)