Essays Low‑impact | ||||||||||
|
Just a warning that I have a COI in relation to one of the papers prominently discussed in this essay. However, the results of the paper are fully reproducible from the source data + code, independent of me, and the peer reviews of the paper are public (and the official peer-reviewed version of the paper is CC-BY). Moreover, the essay is not a Wikipedia article. Feel free to edit. Boud ( talk) 23:32, 16 September 2021 (UTC)
I got into a similar discussionn on the French wikipedia following Turkey's numbers being proven by BBC's fact checkers as being phony ('700 border alerts' were présented as '700 border attacks'). The source of the statement was authoritative, but deeply involved and with COI in the matter. French :fr:WP:SPR states :
I don't know if a similar statement is in Wikipedia:Citing_sources. Yug (talk) 🐲 15:35, 22 January 2022 (UTC)
1. ... that have been reputably published may be used in Wikipedia, but only with care, because it is easy to misuse them.Have all the national health agencies (in some cases, sub-national) been checked to be reliable publishers? The words reputably published link back to WP:RS. The closest thing to the topic is WP:RS/MC, but a governmental health agency is not a peer-reviewed systematic review article.
3. A primary source may be used on Wikipedia only to make straightforward, descriptive statements of facts that can be verified by any educated person with access to the primary source but without further, specialized knowledge.This could hypothetically be valid if government health agencies published their full PCR test individual reports, full of private details that are certainly illegal to publish in the EU, with full lists of the medical labs which published all their reports with successive numbers so that someone checking would know if any were missing; but even in this unrealistic hypothetical case, an educated person would have to do a huge amount of work to add up the numbers. So 1 and 3 could quite likely, if applied bureaucratically, be used to remove all the COVID-19 data from en.Wikipedia on the grounds that the primary source is not acceptable for this purpose. The JHU CSSE collection of the data might count as WP:SECONDARY, since it is an extra step removed from the data collection, but it includes the dubious official data along with the more credible official data, so it's not any more reliable in practice than the Wikipedia data collected directly from the government agencies.So my point here is that guidelines or a policy for how to handle open govt data will have to be evolved from discussion and consensus in particular cases. But I agree that a useful starting point is something like official is not necessarily reliable. Your 'in a nutshell' summary is useful. :) Boud ( talk) 03:39, 7 February 2022 (UTC)
See Wikipedia:Reliability of open government data/Intralexical's Response. Thanks for your comments :). Here are point-by-point responses.
... where controversy or shortcomings in accuracy may exist ... coverage about its reliability (such as in a dedicated, prominent section) should also be included where it is most relevant.
That means that a noticeboard for "official sources" would probably serve to ascertain reliability, which may be redundant with existing mechanisms for assessing the reliability of sources.
Depending on a likely insular group of people with rather specialized skills risks undermining that.
Are they to be synthesized using the data from multiple peer-reviewed sources, or do they just report assessments made by the peer-reviewed sources?
Or it may just cover up a flawed process with the appearance of objectivity.
The Autocratic Republic has confirmed that 1,234 people
Overall, I tend to agree that there could be a problem handling this issue due to (at least initially) having too few people with sufficient basic computational skills to double-check the peer-viewed open-access, reproducible research (and the approaches to reproducibility vary), with the risk that decision-making power on "knowledge" ends up de facto concentrated into too few hands. On the other hand, the amount of open government data is going to keep increasing, and the pressure to include it in Wikipedia will keep increasing. And things such as the domination of the infobox in the mobile view of Wikipedia articles, and the automatic feeding through of Wikipedia infobox data to search engines, and the fact that we have many nice graphs (for the COVID-19 pandemic) result in the prose discussion of reliability having no effect at all in giving any nuance to these forms of information distribution. Splitting off official data reliability as a different type of reliability assessment to the current WP:RS/N still seems preferable to me (provided that people are actually willing to do it).
Boud ( talk) 16:51, 8 April 2022 (UTC)
I have the feeling that there are some things such as an official data template that could be started immediately by someone who likes writing templates, and tentatively experimented with to see how the community responds. See Wikipedia:Reliability of open government data#Templates.
Also, I guess that someone could also take the government health agencies named in the peer-reviewed, open-access, reproducible (to varying levels) papers below to WP:RS/N and see at what level people wish to rate these, but the descriptions of the ratings would still not quite make sense: would we have to remove the Algerian and Belarusian COVID-19 data and graphs if the decision for both were "deprecated"? We would really need something like "generally-reliable-but-OK-as-official-data" and "deprecated-but-OK-as-official-data". The RS ratings symbols and definitions (e.g. listed at WP:RSP) already have a lot of nuances, and the cognitive burden of editors working through these is already significant. It's not as simple as yes/no. This is again why I don't think that official sources of open government data would quite fit into WP:RS/N; they would require making the ratings system even more nuanced/complicated. Boud ( talk) 17:08, 8 April 2022 (UTC)
Copied from User talk:Jsamwrites#Open government data for convenience: "This is an interesting problem. The Wikidata community is trying to tackle this problem to a certain extent for scholarly articles. Take, for example, Wikidata has properties that help to track certain disputed statements or even the retraction of scientific articles: is retracted by or statement disputed by. Some external applications use is retracted by to highlight that a scholarly article (already on Wikidata) has an associated retraction. Though, I am not sure whether these have been used for open data(sets), which are tracked by external data available at or open data portal. This requires some more study and concrete examples." Jsamwrites ( talk) 15:30, 22 May 2022 (UTC)
@ Jsamwrites: I took the liberty of pasting your comments here, because other people might potentially be interested (e.g. Mike Peel once the Board election is over...). To continue from your comments, here's a sketch of what might be doable in the case of the SARS-CoV-2 infection count data from WP C19CCTF:
I'm unlikely to get into Wikidata to try something like this, especially as I have a COI, but this could open up the way for usage for similar analyses of open medical data, or of open electoral data, or of other open government data. This could help avoid a Manichean view of open government data, either "it's all nonsense" or "it's all true because it's official", which risks evolving. Boud ( talk) 23:25, 2 September 2022 (UTC)
Essays Low‑impact | ||||||||||
|
Just a warning that I have a COI in relation to one of the papers prominently discussed in this essay. However, the results of the paper are fully reproducible from the source data + code, independent of me, and the peer reviews of the paper are public (and the official peer-reviewed version of the paper is CC-BY). Moreover, the essay is not a Wikipedia article. Feel free to edit. Boud ( talk) 23:32, 16 September 2021 (UTC)
I got into a similar discussionn on the French wikipedia following Turkey's numbers being proven by BBC's fact checkers as being phony ('700 border alerts' were présented as '700 border attacks'). The source of the statement was authoritative, but deeply involved and with COI in the matter. French :fr:WP:SPR states :
I don't know if a similar statement is in Wikipedia:Citing_sources. Yug (talk) 🐲 15:35, 22 January 2022 (UTC)
1. ... that have been reputably published may be used in Wikipedia, but only with care, because it is easy to misuse them.Have all the national health agencies (in some cases, sub-national) been checked to be reliable publishers? The words reputably published link back to WP:RS. The closest thing to the topic is WP:RS/MC, but a governmental health agency is not a peer-reviewed systematic review article.
3. A primary source may be used on Wikipedia only to make straightforward, descriptive statements of facts that can be verified by any educated person with access to the primary source but without further, specialized knowledge.This could hypothetically be valid if government health agencies published their full PCR test individual reports, full of private details that are certainly illegal to publish in the EU, with full lists of the medical labs which published all their reports with successive numbers so that someone checking would know if any were missing; but even in this unrealistic hypothetical case, an educated person would have to do a huge amount of work to add up the numbers. So 1 and 3 could quite likely, if applied bureaucratically, be used to remove all the COVID-19 data from en.Wikipedia on the grounds that the primary source is not acceptable for this purpose. The JHU CSSE collection of the data might count as WP:SECONDARY, since it is an extra step removed from the data collection, but it includes the dubious official data along with the more credible official data, so it's not any more reliable in practice than the Wikipedia data collected directly from the government agencies.So my point here is that guidelines or a policy for how to handle open govt data will have to be evolved from discussion and consensus in particular cases. But I agree that a useful starting point is something like official is not necessarily reliable. Your 'in a nutshell' summary is useful. :) Boud ( talk) 03:39, 7 February 2022 (UTC)
See Wikipedia:Reliability of open government data/Intralexical's Response. Thanks for your comments :). Here are point-by-point responses.
... where controversy or shortcomings in accuracy may exist ... coverage about its reliability (such as in a dedicated, prominent section) should also be included where it is most relevant.
That means that a noticeboard for "official sources" would probably serve to ascertain reliability, which may be redundant with existing mechanisms for assessing the reliability of sources.
Depending on a likely insular group of people with rather specialized skills risks undermining that.
Are they to be synthesized using the data from multiple peer-reviewed sources, or do they just report assessments made by the peer-reviewed sources?
Or it may just cover up a flawed process with the appearance of objectivity.
The Autocratic Republic has confirmed that 1,234 people
Overall, I tend to agree that there could be a problem handling this issue due to (at least initially) having too few people with sufficient basic computational skills to double-check the peer-viewed open-access, reproducible research (and the approaches to reproducibility vary), with the risk that decision-making power on "knowledge" ends up de facto concentrated into too few hands. On the other hand, the amount of open government data is going to keep increasing, and the pressure to include it in Wikipedia will keep increasing. And things such as the domination of the infobox in the mobile view of Wikipedia articles, and the automatic feeding through of Wikipedia infobox data to search engines, and the fact that we have many nice graphs (for the COVID-19 pandemic) result in the prose discussion of reliability having no effect at all in giving any nuance to these forms of information distribution. Splitting off official data reliability as a different type of reliability assessment to the current WP:RS/N still seems preferable to me (provided that people are actually willing to do it).
Boud ( talk) 16:51, 8 April 2022 (UTC)
I have the feeling that there are some things such as an official data template that could be started immediately by someone who likes writing templates, and tentatively experimented with to see how the community responds. See Wikipedia:Reliability of open government data#Templates.
Also, I guess that someone could also take the government health agencies named in the peer-reviewed, open-access, reproducible (to varying levels) papers below to WP:RS/N and see at what level people wish to rate these, but the descriptions of the ratings would still not quite make sense: would we have to remove the Algerian and Belarusian COVID-19 data and graphs if the decision for both were "deprecated"? We would really need something like "generally-reliable-but-OK-as-official-data" and "deprecated-but-OK-as-official-data". The RS ratings symbols and definitions (e.g. listed at WP:RSP) already have a lot of nuances, and the cognitive burden of editors working through these is already significant. It's not as simple as yes/no. This is again why I don't think that official sources of open government data would quite fit into WP:RS/N; they would require making the ratings system even more nuanced/complicated. Boud ( talk) 17:08, 8 April 2022 (UTC)
Copied from User talk:Jsamwrites#Open government data for convenience: "This is an interesting problem. The Wikidata community is trying to tackle this problem to a certain extent for scholarly articles. Take, for example, Wikidata has properties that help to track certain disputed statements or even the retraction of scientific articles: is retracted by or statement disputed by. Some external applications use is retracted by to highlight that a scholarly article (already on Wikidata) has an associated retraction. Though, I am not sure whether these have been used for open data(sets), which are tracked by external data available at or open data portal. This requires some more study and concrete examples." Jsamwrites ( talk) 15:30, 22 May 2022 (UTC)
@ Jsamwrites: I took the liberty of pasting your comments here, because other people might potentially be interested (e.g. Mike Peel once the Board election is over...). To continue from your comments, here's a sketch of what might be doable in the case of the SARS-CoV-2 infection count data from WP C19CCTF:
I'm unlikely to get into Wikidata to try something like this, especially as I have a COI, but this could open up the way for usage for similar analyses of open medical data, or of open electoral data, or of other open government data. This could help avoid a Manichean view of open government data, either "it's all nonsense" or "it's all true because it's official", which risks evolving. Boud ( talk) 23:25, 2 September 2022 (UTC)