Operator: Nemo_bis ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search) for this task; Pintoch ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search) as main owner and author of the bot
Time filed: 13:52, Thursday, July 25, 2019 ( UTC)
Function overview: Add and maintain supported
identifiers to citation templates (mostly {{
cite journal}}), including related metadata such as
access level but excluding the |url=
parameter.
Automatic: A queue of edits is created automatically (manually triggered), then a cursory review of its contents is performed manually to exclude anomalies, then select items are moved to a queue for the bot to perform them automatically. Edits are then sampled for manual checks and some manual fixes are performed by the operators in the few hours or days following a bot run on the pages which ended up on Category:CS1 maintenance (typically less than one in a thousand).
Programming language(s): Python
Source code available: https://github.com/dissemin/oabot / phabricator:tag/oabot/ (relying on https://github.com/dissemin/dissemin/ and https://github.com/Impactstory/oadoi )
Links to relevant discussions (where appropriate): Wikipedia talk:OABOT, Help_talk:Citation_Style_1#RfC_on_linking_title_to_PMC and similar for the desirability of identifiers and precise information on them.
Edit period(s): Once every few weeks or months.
Estimated number of pages affected: Less than 20k for the first steps; more than 300k overall considering all articles with DOIs.
Namespace(s): 0
Exclusion compliant (Yes/No): Yes
Adminbot (Yes/No): No
Function details: Following the success of task OAbot 2, we're proposing to extend the functionality of the bot to all identifiers. The addition of arxiv and PMC identifiers (about 25k edits) has been a success: it has encountered few mistakes and the bot has been made more robust in response (for instance we are now stricter in matching publications).
The first step will be to add |hdl=
identifiers and |hdl-access=
status on about 2k articles. Those
handles typically point to an
institutional repository like
https://ntrs.nasa.gov/ or
https://deepblue.lib.umich.edu/ (the most common in the queue is
https://quod.lib.umich.edu/ for now).
Citation bot is also able to add such identifiers, but does so more slowly and does not (yet) set access status, while we now do (
T228632): example edit
[1].
After this is done, other identifiers will be handled depending on demand and volumes. The most consequential work will be to eventually add |doi-access=free
to all relevant citations (an
estimated 200k DOIs): this functionality was part of the
original request (and not challenged by anybody) but later dropped when the bot became a user-triggered tool, as the number of required edits is incompatible with human editing.
Expected improvements in the new future, if this task is approved, include:
|url=
parameter at all using the bot account. I'll note however that
WP:SAYWHERE specifically states that «You do not have to specify how you obtained and read it. So long as you are confident that you read a true and accurate copy, it does not matter [...]».
Nemo 16:24, 25 July 2019 (UTC)
replythe bot won't add identifiers without some kind of procedure to reject unsuitable identifiersas we've had copyright problems and disputes about them. I am not sure if the problem was resolved, though. Jo-Jo Eumerus ( talk, contributions) 14:20, 30 July 2019 (UTC) reply
Sounds great, thanks for the update. – SJ + 18:38, 16 August 2019 (UTC) reply
|url=
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=...
→ |citeseerx=...
is fine), then I see little that is objectionable. So let's see a trial at least, with an explicit list of identifiers covered, and we'll have a better idea what's in store. |doi-access=
, but it misses several more
[6]. Any way to catch/report those?
Headbomb {
t ·
c ·
p ·
b} 15:14, 28 September 2019 (UTC)
reply
Operator: Nemo_bis ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search) for this task; Pintoch ( talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search) as main owner and author of the bot
Time filed: 13:52, Thursday, July 25, 2019 ( UTC)
Function overview: Add and maintain supported
identifiers to citation templates (mostly {{
cite journal}}), including related metadata such as
access level but excluding the |url=
parameter.
Automatic: A queue of edits is created automatically (manually triggered), then a cursory review of its contents is performed manually to exclude anomalies, then select items are moved to a queue for the bot to perform them automatically. Edits are then sampled for manual checks and some manual fixes are performed by the operators in the few hours or days following a bot run on the pages which ended up on Category:CS1 maintenance (typically less than one in a thousand).
Programming language(s): Python
Source code available: https://github.com/dissemin/oabot / phabricator:tag/oabot/ (relying on https://github.com/dissemin/dissemin/ and https://github.com/Impactstory/oadoi )
Links to relevant discussions (where appropriate): Wikipedia talk:OABOT, Help_talk:Citation_Style_1#RfC_on_linking_title_to_PMC and similar for the desirability of identifiers and precise information on them.
Edit period(s): Once every few weeks or months.
Estimated number of pages affected: Less than 20k for the first steps; more than 300k overall considering all articles with DOIs.
Namespace(s): 0
Exclusion compliant (Yes/No): Yes
Adminbot (Yes/No): No
Function details: Following the success of task OAbot 2, we're proposing to extend the functionality of the bot to all identifiers. The addition of arxiv and PMC identifiers (about 25k edits) has been a success: it has encountered few mistakes and the bot has been made more robust in response (for instance we are now stricter in matching publications).
The first step will be to add |hdl=
identifiers and |hdl-access=
status on about 2k articles. Those
handles typically point to an
institutional repository like
https://ntrs.nasa.gov/ or
https://deepblue.lib.umich.edu/ (the most common in the queue is
https://quod.lib.umich.edu/ for now).
Citation bot is also able to add such identifiers, but does so more slowly and does not (yet) set access status, while we now do (
T228632): example edit
[1].
After this is done, other identifiers will be handled depending on demand and volumes. The most consequential work will be to eventually add |doi-access=free
to all relevant citations (an
estimated 200k DOIs): this functionality was part of the
original request (and not challenged by anybody) but later dropped when the bot became a user-triggered tool, as the number of required edits is incompatible with human editing.
Expected improvements in the new future, if this task is approved, include:
|url=
parameter at all using the bot account. I'll note however that
WP:SAYWHERE specifically states that «You do not have to specify how you obtained and read it. So long as you are confident that you read a true and accurate copy, it does not matter [...]».
Nemo 16:24, 25 July 2019 (UTC)
replythe bot won't add identifiers without some kind of procedure to reject unsuitable identifiersas we've had copyright problems and disputes about them. I am not sure if the problem was resolved, though. Jo-Jo Eumerus ( talk, contributions) 14:20, 30 July 2019 (UTC) reply
Sounds great, thanks for the update. – SJ + 18:38, 16 August 2019 (UTC) reply
|url=
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=...
→ |citeseerx=...
is fine), then I see little that is objectionable. So let's see a trial at least, with an explicit list of identifiers covered, and we'll have a better idea what's in store. |doi-access=
, but it misses several more
[6]. Any way to catch/report those?
Headbomb {
t ·
c ·
p ·
b} 15:14, 28 September 2019 (UTC)
reply