This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 25 | Archive 26 | Archive 27 | Archive 28 | Archive 29 | Archive 30 | → | Archive 35 |
Some observations of Citation bot's template-changing behaviour from today:
These are from only the past handful of edits that popped up on my watchlist. I find these kinds of changes to be arbitrary and, given the inconsistency and the fact that these template changes do not actually affect the rendered article, the bot should not perform these changes unless there is a clear point in doing so. IceWelder [ ✉] 08:31, 1 October 2021 (UTC)
My overnight batch job of 2,199 articles (" Vehicles, part 2 of 3") stalled at 08:10 UTC. This can be seen in the latest set of bot edits, where the last edit to this set is item #140, an edit [1] to Driver (series).
I tried to remedy this at about 08:40 by using https://citations.toolforge.org/kill_big_job.php, which promptly responded Existing large job flagged for stopping
.
But over an hour later, attempts to start a new job (with
a new list) using
https://citations.toolforge.org/linked_pages.php get still get a response of Run blocked by your existing big run
.
Meanwhile, the bot was happy to process an individual page request from me: see [2] at 09:49. -- BrownHairedGirl (talk) • ( contribs) 10:05, 2 October 2021 (UTC)
{{
fixed}} the underly bug.
AManWithNoPlan (
talk)
16:01, 4 October 2021 (UTC)
My overnight batch of 2,198 articles (" Vehicles, part 3 of 3") was dropped at 08:00 UTC this morning, after this edit [4] to National Tyre Distributors Association, which was #661 in the list. See the bot contribs for that period, where that edit is #126 in the contribs list.
When I spotted it, I was able to start a new batch at 10:48 [5], so the overnight batch wasn't stuck like yesterday.
This is a bit tedious. -- BrownHairedGirl (talk) • ( contribs) 11:14, 3 October 2021 (UTC)
Run blocked by your existing big run
in response to my actions to resume the run.Run blocked by your existing big run
.{{ fixed}} the underly bug. AManWithNoPlan ( talk) 16:01, 4 October 2021 (UTC)
After my big overnight job of 2,198 pages was dropped (see above #Job dropped), I ran another small job of 100 pages. That was processed successfully.
I then resumed the overnight job " Vehicles, part 3 of 3 resumed" (1,537 pages) ... but it has stalled.
See the latest bot contribs: that batched started processing at 11:41 [6], but stalled after its fifth edit [7], at 11:42.
At 12:14 I used https://citations.toolforge.org/kill_big_job.php to try to kill this stalled job. Now 15 minutes later, I still can't start a new job: the response is Run blocked by your existing big run
. --
BrownHairedGirl
(talk) • (
contribs)
12:31, 3 October 2021 (UTC)
Run blocked by your existing big run
.
--
BrownHairedGirl
(talk) • (
contribs)
19:25, 3 October 2021 (UTC) ~Renamed "magazine" -> "journal"
~Renamed "journal" -> "magazine"
{{ fixed}} the infinite loop and added a test that will make that does not happen again. AManWithNoPlan ( talk) 16:01, 4 October 2021 (UTC)
My batch job " Food, part 1 of 6" (595 pages) has been stalled for over an hour, since this edit [12] to Bill Knapp's.
@ AManWithNoPlan, please can you take a peek? -- BrownHairedGirl (talk) • ( contribs) 20:31, 5 October 2021 (UTC)
So, for example, in the
Category:Human rights in Saudi Arabia there are 7 articles and 4 categories, The bot counts these as 11, and reports back for each category as if it was an article, for instance:
No changes needed. Category:Saudi Arabian human rights activists
Presumably if there was a citation with correctable errors on the subcategory page the bot would make an edit. But I have never seen a category page with a citation in it, and I have never seen the bot edit a category page. Even though the bot quickly runs the category pages it is presented with, it must take a little time to do nothing to each one, and in the aggregate this wastes bot time. If possible, could the bot be instructed to ignore subcategory pages? Abductive ( reasoning) 18:08, 7 October 2021 (UTC)
Something went wrong at Fire of Moscow (1812). I cannot figure out what it is, please take a look. Taksen ( talk) 06:14, 8 October 2021 (UTC)
My big overnight batch " South Asia, part 5 of 6" (2,196 pages) stalled after this 09: 23 edit [13] to page 2162/2196: see edit #135 on this set of bot contribs.
I left it until about 09:48 before trying to start a new batch (" Food, part 3 of 6", 593 pages). The bot made its first edit to that batch at 09:49 [14]. I had not needed to kill the first job.
I then set about working on the remaining 34 pages. Run Citation bot via the toolbar, let the page finish, run it in the next page ... then do manual followup on each page.
The first of those missed pages on which I invoked the bot was #2163 of "
South Asia, part 5 of 6":
Sambalpur (Lok Sabha constituency). That stalled, so I went on and processed the next nine. After more than an hour, the bot request on
Sambalpur (Lok Sabha constituency) timed out as 502 Bad Gateway
It seems that in batch mode, the bot drops the stalled page more promptly. However, it should not also kill the batch, since the next 9 pages were fine.
I know that @ AManWithNoPlan has recently put a lot of work into these stalling issues, but it's not quite fixed yet. -- BrownHairedGirl (talk) • ( contribs) 11:35, 8 October 2021 (UTC)
See the latest bot contribs: my batch job food, part 5 of 6 (590 pages) stalled after this edit [15] 266/590 to Rodrick Rhodes, which is 16 in that contribs list.
i can't start a new batch. -- BrownHairedGirl (talk) • ( contribs) 21:25, 8 October 2021 (UTC)
Has the ability to run this bot offline (with sufficient API rate limiting, both for the mediawiki API and the data provider APIs) been considered? That's one way to solve the discussions over capacity. It seems strange to me that in this day and age a computing resource like this has limited capacity. Enterprisey ( talk!) 02:15, 14 September 2021 (UTC)
Neither of those are accessed when run in slow modeas saying that the APIs are not accessed during slow mode, but that doesn't make total sense to me. Enterprisey ( talk!) 23:32, 15 September 2021 (UTC)
flag as {{ fixed}} since it seems to already exist as a feature. AManWithNoPlan ( talk) 14:49, 15 October 2021 (UTC)
I understand from the earlier arguments that the bot wastes much time looking up the metadata for citations that it already had processed in earlier runs, especially when run on batches of pages. How much (if any) storage space is available to the bot? If it doesn't already, could it cache the metadata of citations it processes (or the resulting template code, or maybe just a hash by which to recognize a previously-seen citation), so that it can waste less time when encountering an already-processed citation? — 2d37 ( talk) 03:04, 16 September 2021 (UTC)
Citationbot made some edits to an article with the comment "Removed URL that duplicated unique identifier. Removed accessdate with no specified URL.". Some of the journal citations did indeed have PMC identification numbers, but the fulltext URL links were not to the PMC fulltext, so they weren't straight duplicates. I'm not sure it's safe to assume that the PMC fulltext is always the best available online copy; in some cases another site might have a better scan of the same article. I'm not sure we should implicitly ban access-date parameters from any source that is on PMC, either. In one case, the article was not on PMC; there was a PubmedID, and a link to the publisher's site for the fulltext. In this case, the automated edit had the effect of concealing the existence of a publicly-available fulltext. I suspect this may not be the intended behaviour; perhaps the tool was just expected to prevent their being two links to the same PMC page in a single citation?
Separately, I'm uneasy in giving precedence to PMC links over other links, as PMC and Pubmed contain third-party tracking content from data brokers, currently including Google and Qualtrics. I wrote to the NIH some years back about this, pointing out that it could give these actors sensitive medical information if people looked up things they or their friends had been diagnosed with. They did not want to engage on the topic. One of the links deleted was to the Europe PMC page, which admittedly looks no better, European data regulations aside. This is a complex question, and it might be a good idea to discuss it at Wikipedia Talk:MED. HLHJ ( talk) 23:55, 10 October 2021 (UTC)
Template:Inconsistent citations has been nominated for deletion. You are invited to comment on the discussion at the entry on the Templates for discussion page. * Pppery * it has begun... 03:12, 17 October 2021 (UTC)
10.1073/pnas doi's are only free after a set number of years, and the path through the code looks for free before it add the year. I will see if easily fixable.
AManWithNoPlan (
talk)
17:44, 17 October 2021 (UTC)
Looking through the bot's contributions just now, it looks like I accidentally ran Citation bot through Category:Terrorism twice in a row (at least it found some more things to fix the second time round, so a bit yay, I guess?), rather than once; this was not intentional on my part. I think what happened is I started the bot running on the category, thinking it had already finished going through Category:Sexism, only to see a few minutes later that it wasn't quite done with the earlier category (note: when running Citation bot through big jobs like these, I rely on monitoring the bot's contributions page, as the Citation bot console invariably errors out with a 502 or 504 gateway error well before the bot finishes with the job; as the bot's actions only show up in the contributions log when it actually finds something to correct, it often takes some degree of guesswork when I'm trying to figure out if it's finished a run or not). Upon finding this out, I waited somewhat longer for Citation bot to completely finish with Category:Sexism, and then, assuming (somewhat stupidly in retrospect) that the first attempt had been blocked by the still-going earlier run (and, thus, hadn't taken), I went back to the Citation bot console a second time and started it on Category:Terrorism again - only the first attempt hadn't been blocked after all, and the bot proceeded to queue up the second terrorist run right behind the first (and did not, as I'd assumed would happen in this kind of situation, block the second attempt). Oops.🤦♀️ Anyone whose runs didn't go through because of this, feel free to give me a well-deserved trouting right now. Whoop whoop pull up Bitching Betty ⚧ Averted crashes 00:13, 18 October 2021 (UTC)
I accidentally rebooted the bot on a list it already ran through [26]. Could you kill my run, and save ~842 mostly pointless article attempts? Headbomb { t · c · p · b} 01:28, 19 October 2021 (UTC)
CB continues to make conversions like what I reported recently, for example here, here, here, and here. I am still under the impression that CB should not perform such changes; they are arbitrary, cosmetic, and often improper. IceWelder [ ✉] 13:16, 13 October 2021 (UTC)
|work=
to |journal=
is definitely incorrect. In fact, RPS has only ever had a website; a "conversion to cite magazine" should not occur.
IceWelder [
✉]
15:49, 13 October 2021 (UTC)
Is there a list somewhere of the most common websites for linking out? I would like to add a bunch of websites to the is_magazine, is_journal, etc lists AManWithNoPlan ( talk) 12:34, 20 October 2021 (UTC)
Register | British Newspaper Archive
. e.g. {{Cite web|url=https://www.britishnewspaperarchive.co.uk/viewer/bl/0000425/18440304/002/0001|title=Register | British Newspaper Archive}}
, which renders as:
https://www.britishnewspaperarchive.co.uk/viewer/bl/0000425/18440304/002/0001. {{
cite web}}
: Missing or empty |title=
(
help)
Thanks for the prompt fix, @
AManWithNoPlan. --
BrownHairedGirl
(talk) • (
contribs)
20:09, 24 October 2021 (UTC)
This seems obvious maybe you already considered it before but in
this example would it make sense to include a |website=www.comedy.co.uk
when a cite web? --
Green
C
00:38, 27 October 2021 (UTC)
(ghostarchive[.]org|conifer[.]rhizome.org|newspaperarchive|webarchiv[.]cz|digar[.]ee|bib-bvb[.]de|webcache[.]googleusercontent[.]com|timetravel[.]mementoweb|webrecorder[.]io|nla.gov.au)
.. and new ones will become known. The mapping of domains to name would be nice. One method extract existing domains from URL and determine how most frequently written in work/website and use that as the default. eg. [[Time (magazine)|Time]]
= www.time.com as most common usage .. such a table might be useful for other purposes as well. Almost like a separate program, build the table, download data on occasion and incorporate into the bot locally. --
Green
C
18:35, 27 October 2021 (UTC){{ fixed}}
|publisher=
parameter from two uses of {{
cite news}}
(
verify). I do not see that parameter in the
list of deprecated/removed parameters at
Template:Cite news.
When the publisher is basically the same as the work parameter, then it should not be included. Also, for most publications the publisher is not very important, such as academic journals, where the publishers generally have no control since the editors run the show.
AManWithNoPlan (
talk)
13:49, 29 October 2021 (UTC)
Newest case of incorrect cite type is here. Edge is a magazine and (formerly) a website; the website is cited. CB made it a newspaper. IceWelder [ ✉] 12:12, 29 October 2021 (UTC)
This Citation Style 1 template is used to create citations for news articles in print, video, audio or web.Izno ( talk) 18:47, 29 October 2021 (UTC)
|newspaper=
is wrong in such cases either way.
IceWelder [
✉]
19:04, 29 October 2021 (UTC)
I have removed the journals.lww.com code. They shut that website down a while back, and obviously it is alive again. Weird.
AManWithNoPlan (
talk)
14:53, 29 October 2021 (UTC)
I am not sure what you are talking about. Please explain.
AManWithNoPlan (
talk)
14:28, 1 November 2021 (UTC)
|journal=
|series=CUNY Academic Works
the discussion at User talk:Citation bot/Archive_28#Adding_website_field was archived too quickly, but in its brief appearance I bookmarked https://github.com/ms609/citation-bot/pull/3790/files to scrutinise the HOSTNAME_MAP array.
The issue I was looking for is websites which host more than one newspaper. The three examples I checked are:
In each case, HOSTNAME_MAP appears to be unaware of the Sunday variation. BrownHairedGirl (talk) • ( contribs) 13:13, 31 October 2021 (UTC)
{{ fixed}}
{{cite journal |doi=10.1163/1570-6699_eall_EALL_COM_vol3_0247 }}
with Citation Bot does not result in a normal book citation but somehow only gets the chapter title. I had to hand edit into {{cite book |first1=Kimary N. |last1=Shahin |chapter=Palestinian Arabic |title=Encyclopedia of Arabic Language and Linguistics |editor1-first=Lutz |editor1-last=Edzard |editor2-first=Rudolf |editor2-last=de Jong |doi=10.1163/1570-6699_eall_EALL_COM_vol3_0247 }}
. There are a lot more examples of DOIs for the same book at
Levantine Arabic.
DOIs consist of both meta-data and URL redirection. Some DOI providers do not provide any meta-data, and some only a few bits. That is all they provided
https://api.crossref.org/v1/works/10.1163/1570-6699_eall_eall_com_vol3_0247
AManWithNoPlan (
talk)
15:58, 5 November 2021 (UTC)
This is beyond our ability to fix.
https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/http%3A%2F%2Fwww.jpbox-office.com%2Ffichfilm.php%3Fid%3D8806
AManWithNoPlan (
talk)
20:07, 8 November 2021 (UTC)
Is there a reason why I'm getting a '502 Bad Gateway' error? I've been trying to use the bot for the article Nicole Kidman, but it keeps giving me this error twice now. Is the error occurring from my end, my internet or something? Or is something wrong with the page or tool? Is it perhaps because there are too many mistakes to fix that it overwhelms the system? Any suggestions on what to do? — Film Enthusiast ✉ 17:15, 3 November 2021 (UTC)
No longer overloaded. {{ fixed}} AManWithNoPlan ( talk) 22:31, 10 November 2021 (UTC)
|chapter=
to a {{
cite journal}} template|chapter=
is not a supported parameter in that template. See
this explanation for more information.
TeemPlayer (
talk)
23:38, 6 November 2021 (UTC)
<ref>
tag is missing the closing </ref>
(see the
help page).</nowiki> → <ref>{{Cite web|url=http://www.conceptcarz.com/article/article.aspx?articleID=3548|title = An Error Has Occured!}}</ref>
Also, went back and fixed a dozen pages with such bad titles - about half from refill and that other old bot.
AManWithNoPlan (
talk)
14:29, 13 November 2021 (UTC)
Please wait an hour at least and try again. But first make sure bot did not actually run already.
AManWithNoPlan (
talk)
17:31, 17 November 2021 (UTC)
{{
cite journal}}
to {{
cite document}}
, a redirect to {{cite journal}}
, so a more-or-less pointless exercise|chapter=
and |isbn=
as clues) so the template should have been changed to {{
cite book}}
|issue=
and |number=
in {{
cite journal}}
are exact aliases of each other. In this case the value assigned to |number=
appears to be incorrect while the value that the bot assigned to the new |issue=
seems to be correct. Because both are present, and because only one is allowed,
Module:Citation/CS1 emits the redundant parameter error message.
That was obscure and rare. Thank you for the report.
AManWithNoPlan (
talk)
15:32, 23 November 2021 (UTC)
This is also part of the wider problem that the bot needs much more capacity, and also that a lot of its time is taken up speculative trawls through wide sets of articles which have not been identified as needing bot attention and which often produce little change. Huge categories are being fed to the bot which changes little over 10% of them, and most of those changes are trivia (type of quote mark in title) or have effect at all on output (removing redundant parameters or changing template type). It would help a lot if those speculative trawls were given a lower priority. -- BrownHairedGirl (talk) • ( contribs) 22:54, 9 August 2021 (UTC)
It seems that the low-return speculative trawls have re-started. @ Abductive has just run a batch job of Category:Venerated Catholics by Pope John Paul II; 364 pages, of which only 29 pages were actually edited by the bot, so 92% of the bot's efforts on this set were wasted. The lower category limit has helped, because this job is 1/10th of the size of similar trawls by Abductive before the limit was lowered ... but it's still not a good use of the bot. How can this sort of thing be more effectively discouraged? -- BrownHairedGirl (talk) • ( contribs) 11:57, 27 August 2021 (UTC)
Abductive | #UCB_webform
matches 370 edits (a batch job of 2200 pages), and a search for Abductive | #UCB_toolbar
matches a further 114 pages.AManWithNoPlancut the limit on category batches twice in response to Abductive's abuse of the higher limit.
have been running 2200 of your [i.e. BHG's] bare urls. You rejected my suggestion (at User talk:Abductive#A_citation_bot_job) that you tackle the pages which transclude {{ Bare URL inline}}, and its transclusion count has dropped by only 84 in the last 25 days, so I doubt you were tackling that.
holding offclaim seem to me to be bogus. -- BrownHairedGirl (talk) • ( contribs) 19:07, 28 August 2021 (UTC)
Requesting individual runs of articles is a major purpose of the bot... but your rapid requests of hundreds of individual articles within a few hours while you are already running a max-size batch is in effect running a second batch, and thereby WP:GAMING the bot's limits. If that is not clear to you, then we have a big problem.
Just about any random category run these days will only achieve 30%is demonstrably false in two respects:
My runs are in no way impeding your efforts. Again, nonsense: your routine use of two channels slows down the bot, and impedes other editors from even starting new jobs.
so you should not be at all concerned about my use of the bot. This is a severe case of WP:IDHT.
Reconfigured to make the bot use its 4 channels more effectively. All you have done is to selfishlyand disruptively grab 2 of the 4 channels for yourself. This makes the bot work no faster, and use its time no more effectively; your WP:GAMING of the limits just gives you a bigger slice of the bot's time and denies other editors access to the bot.
Please find another place to argue. Thanks. -- Izno ( talk) 16:32, 29 August 2021 (UTC)
In this edit
https://en.wikipedia.org/?title=Doctor_Who&type=revision&diff=1057375628&oldid=1057016709 the bot added |date=21December 1963
. To the best of my knowledge, the IMDB was not active in 1963. Obviously the bot is confusing the date of the TV show vs the date the entry was added to IMDB. It made the same addition a week earlier, which I reverted. Is there an easy way to make the bot avoid doing this?
Stepho
talk
11:47, 27 November 2021 (UTC)
last1=Alpha, Bravo
are not expandedlast1=Alpha | first1=Bravo
Ignoring the fact that the bot edit was purely cosmetic, could it actually have done something useful? Here is the (updated) citation:
This might usefully have been changed to last1=Chai |first1= Yan | last2=Guo | first2= Ting
etc.
Is this (a) easy to implement and (b) worth the effort? -- John Maynard Friedman ( talk) 12:02, 28 November 2021 (UTC)
purely cosmetic. It added a date and modified and ISBN. -- BrownHairedGirl (talk) • ( contribs) 17:13, 28 November 2021 (UTC)
Changed web to news and newspaper at revision 1058214656 by Citation bot ( talk) at Valencia, California Fettlemap ( talk) 04:59, 2 December 2021 (UTC)
I'm tired of having to undo edits like this, which stealthily remove paywall and related access indicators. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:34, 2 December 2021 (UTC)
magazine=[[Billboard (magazine)]]
magazine=[[Billboard (magazine)|Billboard]]
journal=[[Foobar (journal)]]
, those too should be updated.
Headbomb {
t ·
c ·
p ·
b}
07:24, 3 December 2021 (UTC)
Ping @ AManWithNoPlan: please can you try to fix this? -- BrownHairedGirl (talk) • ( contribs)
\{\{\s*cite\s+web
) is for me a basic piece of encoding that I do even in short AWB runs. Are there really bots which are sophisticated enough to be allowed to modify citations, but so crude that they don't handle multiple spaces?\{\{\a*cite\s+web)
but that should of course have been \{\{\s*cite\s+web)
|via=NPR
to |newspaper=NPR
|via=NPR
to |website=NPR
, or change to cite news but use something more appropriate like |work=NPR
.
The bot won't fill refs to YouTube.com. This is a nuisance, because in the 20211120 database dump, there are 10,060 articles with bare links to YouTube. That is about 4% of all remaining pages with WP:Bare URLs, so filling them would make a big dent in the backlog.
I tested the bot on the following pages with bare LINK refs to YouTube.com (from this search): Chris Daughtry, Petrol engine, James Mason, Model–view–controller, CBBC (TV channel), House of Gucci, Luke Combs, Flute, Josh Peck, Bloodhound Gang, and Pauly Shore.
So far as I can see from bot's output, the zotero returns no info for YouTube links, which is obviously outside the bot's control. However, I wonder if it would be possible for the bot to do a direct lookup on those pages? Even if the bot just filled the cite param |title=
with the data from the YouTube page's <meta name="title"
, that would be a useful step forward.
BrownHairedGirl
(talk) • (
contribs)
19:45, 26 November 2021 (UTC)
(twitter\.com|google\.com/search|ned\.ipac\.caltech\.edu|pep\-web\.org|ezproxy|arkive\.org|worldcat\.org|kyobobook\.co\.kr|facebook\.com|leighrayment\.com|scholarlycommons\.pacific\.edu\/euler\-works|miar\.ub\.edu\/issn|britishnewspaperarchive\.co\.uk|pressreader\.com|ebooks\.adelaide\.edu\.au)
.
BrownHairedGirl
(talk) • (
contribs)
18:36, 27 November 2021 (UTC){{ fixed}} AManWithNoPlan ( talk) 15:53, 5 December 2021 (UTC)
Only if that's the full name of the journal. Journal of Bioscience shouldn't be capitalized that way.
Headbomb {
t ·
c ·
p ·
b}
20:56, 4 December 2021 (UTC)
Since the author of the source I cited is an organization (Japanese 大阪市立衛生試験所), not a person, I avoided first-last pairing parameters and put it in "editor=", so this bot edit was unnecessary to begin with. But if the bot still wants to fill parameters in the "author=" class, then it should at least take care of easily imaginable and most probable cases such as single words (organizations), over three words, languages with no spacings, "anonymous" in various abbreviations or non-English languages, mutiple persons in one line, etc.. Note that I am not asking the bot to determine whether the string is a person or a non-person, which should be impossible. I'm just asking that if the bot can't fill both parts of a first-last pair, then to stop.
Wotheina (
talk)
06:02, 5 December 2021 (UTC)
|last=
value, even when there's no |first=
value. Vice-versa, as you've observed, doesn't work.the author of the source I cited is an organizationso you entered it as an editor. I don't like using organisations or "Staff" for author parameters, but if you've got an author, say it's the author. If what you have is an editor, use the editor parameters. Don't try to outsmart the templates, because it messes up the metadata. — JohnFromPinckney ( talk / edits) 08:06, 5 December 2021 (UTC)
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 25 | Archive 26 | Archive 27 | Archive 28 | Archive 29 | Archive 30 | → | Archive 35 |
Some observations of Citation bot's template-changing behaviour from today:
These are from only the past handful of edits that popped up on my watchlist. I find these kinds of changes to be arbitrary and, given the inconsistency and the fact that these template changes do not actually affect the rendered article, the bot should not perform these changes unless there is a clear point in doing so. IceWelder [ ✉] 08:31, 1 October 2021 (UTC)
My overnight batch job of 2,199 articles (" Vehicles, part 2 of 3") stalled at 08:10 UTC. This can be seen in the latest set of bot edits, where the last edit to this set is item #140, an edit [1] to Driver (series).
I tried to remedy this at about 08:40 by using https://citations.toolforge.org/kill_big_job.php, which promptly responded Existing large job flagged for stopping
.
But over an hour later, attempts to start a new job (with
a new list) using
https://citations.toolforge.org/linked_pages.php get still get a response of Run blocked by your existing big run
.
Meanwhile, the bot was happy to process an individual page request from me: see [2] at 09:49. -- BrownHairedGirl (talk) • ( contribs) 10:05, 2 October 2021 (UTC)
{{
fixed}} the underly bug.
AManWithNoPlan (
talk)
16:01, 4 October 2021 (UTC)
My overnight batch of 2,198 articles (" Vehicles, part 3 of 3") was dropped at 08:00 UTC this morning, after this edit [4] to National Tyre Distributors Association, which was #661 in the list. See the bot contribs for that period, where that edit is #126 in the contribs list.
When I spotted it, I was able to start a new batch at 10:48 [5], so the overnight batch wasn't stuck like yesterday.
This is a bit tedious. -- BrownHairedGirl (talk) • ( contribs) 11:14, 3 October 2021 (UTC)
Run blocked by your existing big run
in response to my actions to resume the run.Run blocked by your existing big run
.{{ fixed}} the underly bug. AManWithNoPlan ( talk) 16:01, 4 October 2021 (UTC)
After my big overnight job of 2,198 pages was dropped (see above #Job dropped), I ran another small job of 100 pages. That was processed successfully.
I then resumed the overnight job " Vehicles, part 3 of 3 resumed" (1,537 pages) ... but it has stalled.
See the latest bot contribs: that batched started processing at 11:41 [6], but stalled after its fifth edit [7], at 11:42.
At 12:14 I used https://citations.toolforge.org/kill_big_job.php to try to kill this stalled job. Now 15 minutes later, I still can't start a new job: the response is Run blocked by your existing big run
. --
BrownHairedGirl
(talk) • (
contribs)
12:31, 3 October 2021 (UTC)
Run blocked by your existing big run
.
--
BrownHairedGirl
(talk) • (
contribs)
19:25, 3 October 2021 (UTC) ~Renamed "magazine" -> "journal"
~Renamed "journal" -> "magazine"
{{ fixed}} the infinite loop and added a test that will make that does not happen again. AManWithNoPlan ( talk) 16:01, 4 October 2021 (UTC)
My batch job " Food, part 1 of 6" (595 pages) has been stalled for over an hour, since this edit [12] to Bill Knapp's.
@ AManWithNoPlan, please can you take a peek? -- BrownHairedGirl (talk) • ( contribs) 20:31, 5 October 2021 (UTC)
So, for example, in the
Category:Human rights in Saudi Arabia there are 7 articles and 4 categories, The bot counts these as 11, and reports back for each category as if it was an article, for instance:
No changes needed. Category:Saudi Arabian human rights activists
Presumably if there was a citation with correctable errors on the subcategory page the bot would make an edit. But I have never seen a category page with a citation in it, and I have never seen the bot edit a category page. Even though the bot quickly runs the category pages it is presented with, it must take a little time to do nothing to each one, and in the aggregate this wastes bot time. If possible, could the bot be instructed to ignore subcategory pages? Abductive ( reasoning) 18:08, 7 October 2021 (UTC)
Something went wrong at Fire of Moscow (1812). I cannot figure out what it is, please take a look. Taksen ( talk) 06:14, 8 October 2021 (UTC)
My big overnight batch " South Asia, part 5 of 6" (2,196 pages) stalled after this 09: 23 edit [13] to page 2162/2196: see edit #135 on this set of bot contribs.
I left it until about 09:48 before trying to start a new batch (" Food, part 3 of 6", 593 pages). The bot made its first edit to that batch at 09:49 [14]. I had not needed to kill the first job.
I then set about working on the remaining 34 pages. Run Citation bot via the toolbar, let the page finish, run it in the next page ... then do manual followup on each page.
The first of those missed pages on which I invoked the bot was #2163 of "
South Asia, part 5 of 6":
Sambalpur (Lok Sabha constituency). That stalled, so I went on and processed the next nine. After more than an hour, the bot request on
Sambalpur (Lok Sabha constituency) timed out as 502 Bad Gateway
It seems that in batch mode, the bot drops the stalled page more promptly. However, it should not also kill the batch, since the next 9 pages were fine.
I know that @ AManWithNoPlan has recently put a lot of work into these stalling issues, but it's not quite fixed yet. -- BrownHairedGirl (talk) • ( contribs) 11:35, 8 October 2021 (UTC)
See the latest bot contribs: my batch job food, part 5 of 6 (590 pages) stalled after this edit [15] 266/590 to Rodrick Rhodes, which is 16 in that contribs list.
i can't start a new batch. -- BrownHairedGirl (talk) • ( contribs) 21:25, 8 October 2021 (UTC)
Has the ability to run this bot offline (with sufficient API rate limiting, both for the mediawiki API and the data provider APIs) been considered? That's one way to solve the discussions over capacity. It seems strange to me that in this day and age a computing resource like this has limited capacity. Enterprisey ( talk!) 02:15, 14 September 2021 (UTC)
Neither of those are accessed when run in slow modeas saying that the APIs are not accessed during slow mode, but that doesn't make total sense to me. Enterprisey ( talk!) 23:32, 15 September 2021 (UTC)
flag as {{ fixed}} since it seems to already exist as a feature. AManWithNoPlan ( talk) 14:49, 15 October 2021 (UTC)
I understand from the earlier arguments that the bot wastes much time looking up the metadata for citations that it already had processed in earlier runs, especially when run on batches of pages. How much (if any) storage space is available to the bot? If it doesn't already, could it cache the metadata of citations it processes (or the resulting template code, or maybe just a hash by which to recognize a previously-seen citation), so that it can waste less time when encountering an already-processed citation? — 2d37 ( talk) 03:04, 16 September 2021 (UTC)
Citationbot made some edits to an article with the comment "Removed URL that duplicated unique identifier. Removed accessdate with no specified URL.". Some of the journal citations did indeed have PMC identification numbers, but the fulltext URL links were not to the PMC fulltext, so they weren't straight duplicates. I'm not sure it's safe to assume that the PMC fulltext is always the best available online copy; in some cases another site might have a better scan of the same article. I'm not sure we should implicitly ban access-date parameters from any source that is on PMC, either. In one case, the article was not on PMC; there was a PubmedID, and a link to the publisher's site for the fulltext. In this case, the automated edit had the effect of concealing the existence of a publicly-available fulltext. I suspect this may not be the intended behaviour; perhaps the tool was just expected to prevent their being two links to the same PMC page in a single citation?
Separately, I'm uneasy in giving precedence to PMC links over other links, as PMC and Pubmed contain third-party tracking content from data brokers, currently including Google and Qualtrics. I wrote to the NIH some years back about this, pointing out that it could give these actors sensitive medical information if people looked up things they or their friends had been diagnosed with. They did not want to engage on the topic. One of the links deleted was to the Europe PMC page, which admittedly looks no better, European data regulations aside. This is a complex question, and it might be a good idea to discuss it at Wikipedia Talk:MED. HLHJ ( talk) 23:55, 10 October 2021 (UTC)
Template:Inconsistent citations has been nominated for deletion. You are invited to comment on the discussion at the entry on the Templates for discussion page. * Pppery * it has begun... 03:12, 17 October 2021 (UTC)
10.1073/pnas doi's are only free after a set number of years, and the path through the code looks for free before it add the year. I will see if easily fixable.
AManWithNoPlan (
talk)
17:44, 17 October 2021 (UTC)
Looking through the bot's contributions just now, it looks like I accidentally ran Citation bot through Category:Terrorism twice in a row (at least it found some more things to fix the second time round, so a bit yay, I guess?), rather than once; this was not intentional on my part. I think what happened is I started the bot running on the category, thinking it had already finished going through Category:Sexism, only to see a few minutes later that it wasn't quite done with the earlier category (note: when running Citation bot through big jobs like these, I rely on monitoring the bot's contributions page, as the Citation bot console invariably errors out with a 502 or 504 gateway error well before the bot finishes with the job; as the bot's actions only show up in the contributions log when it actually finds something to correct, it often takes some degree of guesswork when I'm trying to figure out if it's finished a run or not). Upon finding this out, I waited somewhat longer for Citation bot to completely finish with Category:Sexism, and then, assuming (somewhat stupidly in retrospect) that the first attempt had been blocked by the still-going earlier run (and, thus, hadn't taken), I went back to the Citation bot console a second time and started it on Category:Terrorism again - only the first attempt hadn't been blocked after all, and the bot proceeded to queue up the second terrorist run right behind the first (and did not, as I'd assumed would happen in this kind of situation, block the second attempt). Oops.🤦♀️ Anyone whose runs didn't go through because of this, feel free to give me a well-deserved trouting right now. Whoop whoop pull up Bitching Betty ⚧ Averted crashes 00:13, 18 October 2021 (UTC)
I accidentally rebooted the bot on a list it already ran through [26]. Could you kill my run, and save ~842 mostly pointless article attempts? Headbomb { t · c · p · b} 01:28, 19 October 2021 (UTC)
CB continues to make conversions like what I reported recently, for example here, here, here, and here. I am still under the impression that CB should not perform such changes; they are arbitrary, cosmetic, and often improper. IceWelder [ ✉] 13:16, 13 October 2021 (UTC)
|work=
to |journal=
is definitely incorrect. In fact, RPS has only ever had a website; a "conversion to cite magazine" should not occur.
IceWelder [
✉]
15:49, 13 October 2021 (UTC)
Is there a list somewhere of the most common websites for linking out? I would like to add a bunch of websites to the is_magazine, is_journal, etc lists AManWithNoPlan ( talk) 12:34, 20 October 2021 (UTC)
Register | British Newspaper Archive
. e.g. {{Cite web|url=https://www.britishnewspaperarchive.co.uk/viewer/bl/0000425/18440304/002/0001|title=Register | British Newspaper Archive}}
, which renders as:
https://www.britishnewspaperarchive.co.uk/viewer/bl/0000425/18440304/002/0001. {{
cite web}}
: Missing or empty |title=
(
help)
Thanks for the prompt fix, @
AManWithNoPlan. --
BrownHairedGirl
(talk) • (
contribs)
20:09, 24 October 2021 (UTC)
This seems obvious maybe you already considered it before but in
this example would it make sense to include a |website=www.comedy.co.uk
when a cite web? --
Green
C
00:38, 27 October 2021 (UTC)
(ghostarchive[.]org|conifer[.]rhizome.org|newspaperarchive|webarchiv[.]cz|digar[.]ee|bib-bvb[.]de|webcache[.]googleusercontent[.]com|timetravel[.]mementoweb|webrecorder[.]io|nla.gov.au)
.. and new ones will become known. The mapping of domains to name would be nice. One method extract existing domains from URL and determine how most frequently written in work/website and use that as the default. eg. [[Time (magazine)|Time]]
= www.time.com as most common usage .. such a table might be useful for other purposes as well. Almost like a separate program, build the table, download data on occasion and incorporate into the bot locally. --
Green
C
18:35, 27 October 2021 (UTC){{ fixed}}
|publisher=
parameter from two uses of {{
cite news}}
(
verify). I do not see that parameter in the
list of deprecated/removed parameters at
Template:Cite news.
When the publisher is basically the same as the work parameter, then it should not be included. Also, for most publications the publisher is not very important, such as academic journals, where the publishers generally have no control since the editors run the show.
AManWithNoPlan (
talk)
13:49, 29 October 2021 (UTC)
Newest case of incorrect cite type is here. Edge is a magazine and (formerly) a website; the website is cited. CB made it a newspaper. IceWelder [ ✉] 12:12, 29 October 2021 (UTC)
This Citation Style 1 template is used to create citations for news articles in print, video, audio or web.Izno ( talk) 18:47, 29 October 2021 (UTC)
|newspaper=
is wrong in such cases either way.
IceWelder [
✉]
19:04, 29 October 2021 (UTC)
I have removed the journals.lww.com code. They shut that website down a while back, and obviously it is alive again. Weird.
AManWithNoPlan (
talk)
14:53, 29 October 2021 (UTC)
I am not sure what you are talking about. Please explain.
AManWithNoPlan (
talk)
14:28, 1 November 2021 (UTC)
|journal=
|series=CUNY Academic Works
the discussion at User talk:Citation bot/Archive_28#Adding_website_field was archived too quickly, but in its brief appearance I bookmarked https://github.com/ms609/citation-bot/pull/3790/files to scrutinise the HOSTNAME_MAP array.
The issue I was looking for is websites which host more than one newspaper. The three examples I checked are:
In each case, HOSTNAME_MAP appears to be unaware of the Sunday variation. BrownHairedGirl (talk) • ( contribs) 13:13, 31 October 2021 (UTC)
{{ fixed}}
{{cite journal |doi=10.1163/1570-6699_eall_EALL_COM_vol3_0247 }}
with Citation Bot does not result in a normal book citation but somehow only gets the chapter title. I had to hand edit into {{cite book |first1=Kimary N. |last1=Shahin |chapter=Palestinian Arabic |title=Encyclopedia of Arabic Language and Linguistics |editor1-first=Lutz |editor1-last=Edzard |editor2-first=Rudolf |editor2-last=de Jong |doi=10.1163/1570-6699_eall_EALL_COM_vol3_0247 }}
. There are a lot more examples of DOIs for the same book at
Levantine Arabic.
DOIs consist of both meta-data and URL redirection. Some DOI providers do not provide any meta-data, and some only a few bits. That is all they provided
https://api.crossref.org/v1/works/10.1163/1570-6699_eall_eall_com_vol3_0247
AManWithNoPlan (
talk)
15:58, 5 November 2021 (UTC)
This is beyond our ability to fix.
https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/http%3A%2F%2Fwww.jpbox-office.com%2Ffichfilm.php%3Fid%3D8806
AManWithNoPlan (
talk)
20:07, 8 November 2021 (UTC)
Is there a reason why I'm getting a '502 Bad Gateway' error? I've been trying to use the bot for the article Nicole Kidman, but it keeps giving me this error twice now. Is the error occurring from my end, my internet or something? Or is something wrong with the page or tool? Is it perhaps because there are too many mistakes to fix that it overwhelms the system? Any suggestions on what to do? — Film Enthusiast ✉ 17:15, 3 November 2021 (UTC)
No longer overloaded. {{ fixed}} AManWithNoPlan ( talk) 22:31, 10 November 2021 (UTC)
|chapter=
to a {{
cite journal}} template|chapter=
is not a supported parameter in that template. See
this explanation for more information.
TeemPlayer (
talk)
23:38, 6 November 2021 (UTC)
<ref>
tag is missing the closing </ref>
(see the
help page).</nowiki> → <ref>{{Cite web|url=http://www.conceptcarz.com/article/article.aspx?articleID=3548|title = An Error Has Occured!}}</ref>
Also, went back and fixed a dozen pages with such bad titles - about half from refill and that other old bot.
AManWithNoPlan (
talk)
14:29, 13 November 2021 (UTC)
Please wait an hour at least and try again. But first make sure bot did not actually run already.
AManWithNoPlan (
talk)
17:31, 17 November 2021 (UTC)
{{
cite journal}}
to {{
cite document}}
, a redirect to {{cite journal}}
, so a more-or-less pointless exercise|chapter=
and |isbn=
as clues) so the template should have been changed to {{
cite book}}
|issue=
and |number=
in {{
cite journal}}
are exact aliases of each other. In this case the value assigned to |number=
appears to be incorrect while the value that the bot assigned to the new |issue=
seems to be correct. Because both are present, and because only one is allowed,
Module:Citation/CS1 emits the redundant parameter error message.
That was obscure and rare. Thank you for the report.
AManWithNoPlan (
talk)
15:32, 23 November 2021 (UTC)
This is also part of the wider problem that the bot needs much more capacity, and also that a lot of its time is taken up speculative trawls through wide sets of articles which have not been identified as needing bot attention and which often produce little change. Huge categories are being fed to the bot which changes little over 10% of them, and most of those changes are trivia (type of quote mark in title) or have effect at all on output (removing redundant parameters or changing template type). It would help a lot if those speculative trawls were given a lower priority. -- BrownHairedGirl (talk) • ( contribs) 22:54, 9 August 2021 (UTC)
It seems that the low-return speculative trawls have re-started. @ Abductive has just run a batch job of Category:Venerated Catholics by Pope John Paul II; 364 pages, of which only 29 pages were actually edited by the bot, so 92% of the bot's efforts on this set were wasted. The lower category limit has helped, because this job is 1/10th of the size of similar trawls by Abductive before the limit was lowered ... but it's still not a good use of the bot. How can this sort of thing be more effectively discouraged? -- BrownHairedGirl (talk) • ( contribs) 11:57, 27 August 2021 (UTC)
Abductive | #UCB_webform
matches 370 edits (a batch job of 2200 pages), and a search for Abductive | #UCB_toolbar
matches a further 114 pages.AManWithNoPlancut the limit on category batches twice in response to Abductive's abuse of the higher limit.
have been running 2200 of your [i.e. BHG's] bare urls. You rejected my suggestion (at User talk:Abductive#A_citation_bot_job) that you tackle the pages which transclude {{ Bare URL inline}}, and its transclusion count has dropped by only 84 in the last 25 days, so I doubt you were tackling that.
holding offclaim seem to me to be bogus. -- BrownHairedGirl (talk) • ( contribs) 19:07, 28 August 2021 (UTC)
Requesting individual runs of articles is a major purpose of the bot... but your rapid requests of hundreds of individual articles within a few hours while you are already running a max-size batch is in effect running a second batch, and thereby WP:GAMING the bot's limits. If that is not clear to you, then we have a big problem.
Just about any random category run these days will only achieve 30%is demonstrably false in two respects:
My runs are in no way impeding your efforts. Again, nonsense: your routine use of two channels slows down the bot, and impedes other editors from even starting new jobs.
so you should not be at all concerned about my use of the bot. This is a severe case of WP:IDHT.
Reconfigured to make the bot use its 4 channels more effectively. All you have done is to selfishlyand disruptively grab 2 of the 4 channels for yourself. This makes the bot work no faster, and use its time no more effectively; your WP:GAMING of the limits just gives you a bigger slice of the bot's time and denies other editors access to the bot.
Please find another place to argue. Thanks. -- Izno ( talk) 16:32, 29 August 2021 (UTC)
In this edit
https://en.wikipedia.org/?title=Doctor_Who&type=revision&diff=1057375628&oldid=1057016709 the bot added |date=21December 1963
. To the best of my knowledge, the IMDB was not active in 1963. Obviously the bot is confusing the date of the TV show vs the date the entry was added to IMDB. It made the same addition a week earlier, which I reverted. Is there an easy way to make the bot avoid doing this?
Stepho
talk
11:47, 27 November 2021 (UTC)
last1=Alpha, Bravo
are not expandedlast1=Alpha | first1=Bravo
Ignoring the fact that the bot edit was purely cosmetic, could it actually have done something useful? Here is the (updated) citation:
This might usefully have been changed to last1=Chai |first1= Yan | last2=Guo | first2= Ting
etc.
Is this (a) easy to implement and (b) worth the effort? -- John Maynard Friedman ( talk) 12:02, 28 November 2021 (UTC)
purely cosmetic. It added a date and modified and ISBN. -- BrownHairedGirl (talk) • ( contribs) 17:13, 28 November 2021 (UTC)
Changed web to news and newspaper at revision 1058214656 by Citation bot ( talk) at Valencia, California Fettlemap ( talk) 04:59, 2 December 2021 (UTC)
I'm tired of having to undo edits like this, which stealthily remove paywall and related access indicators. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:34, 2 December 2021 (UTC)
magazine=[[Billboard (magazine)]]
magazine=[[Billboard (magazine)|Billboard]]
journal=[[Foobar (journal)]]
, those too should be updated.
Headbomb {
t ·
c ·
p ·
b}
07:24, 3 December 2021 (UTC)
Ping @ AManWithNoPlan: please can you try to fix this? -- BrownHairedGirl (talk) • ( contribs)
\{\{\s*cite\s+web
) is for me a basic piece of encoding that I do even in short AWB runs. Are there really bots which are sophisticated enough to be allowed to modify citations, but so crude that they don't handle multiple spaces?\{\{\a*cite\s+web)
but that should of course have been \{\{\s*cite\s+web)
|via=NPR
to |newspaper=NPR
|via=NPR
to |website=NPR
, or change to cite news but use something more appropriate like |work=NPR
.
The bot won't fill refs to YouTube.com. This is a nuisance, because in the 20211120 database dump, there are 10,060 articles with bare links to YouTube. That is about 4% of all remaining pages with WP:Bare URLs, so filling them would make a big dent in the backlog.
I tested the bot on the following pages with bare LINK refs to YouTube.com (from this search): Chris Daughtry, Petrol engine, James Mason, Model–view–controller, CBBC (TV channel), House of Gucci, Luke Combs, Flute, Josh Peck, Bloodhound Gang, and Pauly Shore.
So far as I can see from bot's output, the zotero returns no info for YouTube links, which is obviously outside the bot's control. However, I wonder if it would be possible for the bot to do a direct lookup on those pages? Even if the bot just filled the cite param |title=
with the data from the YouTube page's <meta name="title"
, that would be a useful step forward.
BrownHairedGirl
(talk) • (
contribs)
19:45, 26 November 2021 (UTC)
(twitter\.com|google\.com/search|ned\.ipac\.caltech\.edu|pep\-web\.org|ezproxy|arkive\.org|worldcat\.org|kyobobook\.co\.kr|facebook\.com|leighrayment\.com|scholarlycommons\.pacific\.edu\/euler\-works|miar\.ub\.edu\/issn|britishnewspaperarchive\.co\.uk|pressreader\.com|ebooks\.adelaide\.edu\.au)
.
BrownHairedGirl
(talk) • (
contribs)
18:36, 27 November 2021 (UTC){{ fixed}} AManWithNoPlan ( talk) 15:53, 5 December 2021 (UTC)
Only if that's the full name of the journal. Journal of Bioscience shouldn't be capitalized that way.
Headbomb {
t ·
c ·
p ·
b}
20:56, 4 December 2021 (UTC)
Since the author of the source I cited is an organization (Japanese 大阪市立衛生試験所), not a person, I avoided first-last pairing parameters and put it in "editor=", so this bot edit was unnecessary to begin with. But if the bot still wants to fill parameters in the "author=" class, then it should at least take care of easily imaginable and most probable cases such as single words (organizations), over three words, languages with no spacings, "anonymous" in various abbreviations or non-English languages, mutiple persons in one line, etc.. Note that I am not asking the bot to determine whether the string is a person or a non-person, which should be impossible. I'm just asking that if the bot can't fill both parts of a first-last pair, then to stop.
Wotheina (
talk)
06:02, 5 December 2021 (UTC)
|last=
value, even when there's no |first=
value. Vice-versa, as you've observed, doesn't work.the author of the source I cited is an organizationso you entered it as an editor. I don't like using organisations or "Staff" for author parameters, but if you've got an author, say it's the author. If what you have is an editor, use the editor parameters. Don't try to outsmart the templates, because it messes up the metadata. — JohnFromPinckney ( talk / edits) 08:06, 5 December 2021 (UTC)