Thoughts on these, or any other ideas? I think whatever we decide, it should be objective, to reduce arguments about inclusion.
Levivich20:05, 15 August 2022 (UTC)reply
Are we not including articles on the subject published in reputable peer-reviewed journals for this purpose? In other words, this is solely for topics covered in three publications, these being either books or entries in major general encyclopedias?
BD2412T20:19, 15 August 2022 (UTC)reply
I would allow monographs and encyclopedic entries to add up (e.g., one monograph and two entries would make it).
I would also specifically allow more specialized scholarly encyclopedias (e.g., Dictionary of National Biography, Encyclopaedia of Islam), though these must be of excellent academic reputation (perhaps 'scholarly encyclopedias of high repute'?).
Yes, we should also count papers in
academic journals or chapters in
edited volumes in some way: perhaps we can let three such papers or chapters count for one monograph/encyclopedic entry? That would include everything which has nine papers/chapters, even if there is no monograph/encyclopedic entry. Or is it better to require twelve, making four count for one? ☿
Apaugasma (
talk☉)20:31, 15 August 2022 (UTC)reply
If we're going to go that route, let's just have a points system. Ten points = inclusion. A monograph or encyclopedia entry counts for four points.
BD2412T20:39, 15 August 2022 (UTC)reply
What about just
WP:GNG, except the sources must be academic? Which would include academic journals, academic books, encyclopedias, etc. (
"Tier 1" sources), as long they're
WP:SIGCOV? Or, perhaps more than SIGCOV, they must be the topic of the source?
Levivich20:41, 15 August 2022 (UTC)reply
I was also thinking of a point system because it seems wrong to give as much weight to a journal article as to a monograph. But perhaps it's better to emulate WP:GNG, leaving it vague how many sources exactly are needed in the first place. We could just state that sources must be
WP:TIER1, though monographs, literature/systematic reviews or entries in scholarly encyclopedias should be given more weight than single peer-reviewed articles or chapters in edited volumes (the latter are still missing from WP:TIER1).
WP:SIGCOV says "it does not need to be the main topic of the source material". I would change that for our purposes into it must be the main topic, or one of the main topics, of the source material. On the one hand we need much more than a few paragraphs or pages, but there often is more than one main topic, and demanding it to be the main topic would both be too strict and create problems with establishing whether something truly is the main topic of a source or not. ☿
Apaugasma (
talk☉)21:17, 15 August 2022 (UTC)reply
(I would include "chapters in edited volumes" as part of TIER1's "books published by university presses", but maybe TIER1 should be clarified.)
The question I have is whether we really need to weigh academic sources against other academic sources for the purpose of determining whether a topic is in scope of this project (as opposed to, say, for determining whether something is
WP:DUE in an article). I feel like a topic either "makes the cut" or doesn't "make the cut".
Say, for example, we decide that a topic should be supported by at least three academic sources to be in-scope (with the topic being a main topic of the source material, but not necessarily the main topic or sole main topic). Does it matter to us if the three sources are three books, three journal articles, three encyclopedia entries, or one of each? Do we want to say that topics that are the subject of three journal articles are not in scope? I'm not sure what my answer is to that question.
For practical purposes, we could start strict and expand the scope later. After all, if we're talking about a scope that includes hundreds of thousands of articles (most likely), we'll need to start with a smaller batch anyway (when it comes to adding talk page banners and such).
Levivich18:08, 16 August 2022 (UTC)reply
Yes, start strict. That's why three journal articles, which in my experience covers a lot more than three monographs/encyclopedic entries, should probably not make it, at least not to start with. A simple statement that articles in academic journal contribute a little less to academic notability should suffice though. Note that if we differentiate there, chapters in
edited volumes should also be differentiated from whole books (btw, such chapters are, as a very general rule, of lower quality than papers in journals, as well as than monographs). That's my view, but I would like to hear from others. ☿
Apaugasma (
talk☉)19:28, 16 August 2022 (UTC)reply
How about significant coverage in independent academic sources, and then we define the terms:
OK, I guess that's workable for now. Which bring me to my next question, how to populate the list of articles in scope, which I'll start a separate thread for below.
Levivich15:01, 18 August 2022 (UTC)reply
@
Moxy: I hope you're still interested/available? :-) We have a preliminary list of ~17k mainspace pages (
Wikipedia:WikiProject Core Content/Articles) that I think would grow, maybe significantly (>100k), if this WikiProject moved forward, but a list of 17k seems like enough to start with for now.
What's the usual procedure for "I want to add banners to 17k talk pages"? Is there a bot that does this already? Do I need to create my own and apply for BRFA for this task?
Hi @
Levivich: I suppose this could just be done via
AutoWikiBrowser: Someone would obtain the list, append Talk: to each entry, and use the software to insert the banner. According to
WP:ASSISTED though, a consensus is first needed and a BRFA should be opened if there are any doubts. But if the WikiProject gets created, I would see that as a clear consensus for making those edits. 0x
Deadbeef16:26, 2 September 2022 (UTC)reply
Thanks,
Deadbeef! I've looked around and there are a number of bots that already have approved tasks for WikiProject banner tagging (
User:Legobot,
User:EarwigBot,
User:AnomieBOT,
User:Hazard-Bot,
User:KiranBOT). I'm thinking we'll probably want to fork one of those rather than asking to use them directly, due to the large number of pages and the likelihood of wanting to run multiple runs over time (we don't want to keep bugging a bot operator, nor take over their bot). And we probably want to establish the WikiProject before seeking BRFA: not sure tagging 50k+ pages is justified if there are only a few people interested?
Levivich17:50, 2 September 2022 (UTC)reply
@
Levivich and
0xDeadbeef: I just saw the ping to my bot's ac. A couple of years back, I tried to revive wikiproject organised crime, and wikiproject espionage. They are somewhat better than before. If the targeted articles for the wikiproject are going to be a lot, then it is better to have a some sort of consensus (even if informal) somewhere. I think village pump would be a good idea for that. There we can also get further suggestions. From a technical point of view regarding project banners - it is a little trickier than expected. AWB/AWB bots rely primarily on categories for that task. So the non-recursive (bottom of the category/the category with no sub-cat) should contain only the articles that are expected to be tagged. Please feel free to ping me if you have any questions — about the bot, or wikiprojects in general. I will participate if I find it interesting, and if the time permits. At the least, I will always be available for discussions :-) —usernamekiran
(talk)17:16, 5 September 2022 (UTC)reply
Creating the list of articles
How do we generate a list of articles within the scope of this WikiProject? My thoughts:
I'm not as sure about
WP:VA4 and
WP:VA5, because they cover non-academic topics (e.g. pop culture). For example, I'm not sure that
Sean Connery, a VA4 topic, is the subject of significant scholarly coverage (though undoubtedly he is the subject of significant non-scholarly coverage). Perhaps there are certain VA4 and VA5 categories that we can "automatically import", while not automatically importing other VA categories (like "People")? If we do this for VA5, it'll bring the article count into the tens of thousands. That'll be a good place to start (in terms of setting up an RC feed, etc.), but we'll also be duplicating VA at this "level".
Is every topic that is an entry in
Encyclopedia Britannica in-scope? I believe there are about a couple hundred thousand entries in Britannica. The 1911 and 1922 versions, though old, are available on WikiSource. With a bit of technical magic, we could generate a list of every entry in those two versions. No doubt that will not include modern (post-1922) topics, but it's a start. This would bring us to a six-figure article count, and include articles that are outside the scope of VA.
Alternatively, we could cross-reference Britannica with another encyclopedia or two, and try to come up with a list of entries that are in multiple encyclopedias. I'm not quite sure how to get a list of topics for another encyclopedia (including modern Britannica). Maybe a web crawler? I don't know.
I don't really trust our category system, but maybe taking all the articles that are linked one or two category levels below
Category:Main topic classifications? My thought is that would grab all the top-level or broad-topic articles... but I really don't trust our category system. For example, two levels down is
Category:Communication studies, which you'd think would hold top-level communications articles, but it also has
Making Chastity Sexy and
Pastel QAnon, which strike me as out-of-scope. So I'm not sure this one is a good approach.
Another thought I had is pulling from a list of academic journals. If there's an entire academic journal on a topic, that topic is very likely to be in-scope, right? (And I mean real journals, not the predatory ones.) This would probably be the same as a list of academic disciplines, and we already have a category for that (
Category:Academic disciplines), so maybe this approach won't get us far.
Just a quick note that I personally won't be working on this for the foreseeable future. Sincerely appreciate the efforts, and hope that others will step in. ☿
Apaugasma (
talk☉)15:52, 18 August 2022 (UTC)reply
Cacti, I heart your enthusiasm :-) I have been looking at places where we can pull together a list of articles. Because let's face it, if we are to do something that is not just duplicating VA, we're talking about way more than 50k articles, probably something like 250k-500k articles. Generating a list of 250k-500k topics is not possible manually; it'll have to be done by combining some other lists somehow. It's very hard, or at least I think it's very hard, to figure out a way to generate a list of 500k topics that have significant academic coverage.
Anyway, I've been looking at the 1911 Encyclopedia Britannica, which has a handy list of topics at WikiSource:
wikisource:1911 Encyclopædia Britannica/Classified List of Articles. I had thought, well, OK, it'll be an outdated list and won't have modern topics, like WWI and WWII, but it's a start, right?
Well, I'm not so sure. After looking at the list of 1911 EB topics, I see they include wonderful entries such as "
Quadroon" and "
Mandingo". That makes me rather uncomfortable about just importing that list and saying "it's academic" because it's in an encyclopedia. I naively forgot how ridiculously racist the Western world was 100 years ago (like even more than today, amazingly).
So that's 17k. There's also
Wikipedia:WikiProject Missing encyclopedic articles/Hot, a list of 72k encyclopedia topics, from multiple encyclopedias. Plus Vital Articles, that would bring the list over 100k. For VA, I was thinking all of
WP:VA3 and all of
WP:VA5except the "People" and "Everyday life" categories... I think the other categories are all academic in nature, but those two need some more careful scrutiny, maybe by sub-category. Any thoughts?
Levivich00:17, 27 August 2022 (UTC)reply
I just want to say that having the RC feed is really great. I already made my first
revert, by sheer coincidence on an article related to the very topic I'm writing about these days.
Yes, the list from the Missing encyclopedic article project looks okay (though a spot-check seemed to reveal a lot of articles that may not meet the WP:ANG), and yes, better leave out some subcategories from VA (
VA5 Cities, and to a lesser extent
VA5 Culture, also look problematic).
In general though I would model the inclusion process on the general editing process: anyone can boldly add an article, anyone can boldly remove it, and if two or more editors are at odds about it they are expected to discuss. When a sufficient amount of editors are aware of the process, articles will soon be added and removed all the time. We need a start to get it going, but it doesn't need to be 100% accurate. ☿
Apaugasma (
talk☉)01:28, 27 August 2022 (UTC)reply
Unfortunately for both these lists, editors removed links as they turned blue, so what's left is a tiny proportion of their starting counts. I am not going to import WP:List of encyclopedia topics at all, because it seems to have too many non-academic topics (e.g.
Acid Head,
Auchlochan,
Bass Brook). The Hot list looks solid, but I think the import count will sadly be <6k from the current revisions of the pages. I might later see about pulling more topics from the page histories, but there are dozens of sub-pages and each one has been filled and culled multiple times, so not a quick/easy thing. Alas, I no longer think we'll get to 100k quite so quickly and easily.
Levivich00:09, 28 August 2022 (UTC)reply
Done - current version of the Hotlist pages added, about another 5k articles, all VA3, and all VA5 except "People" and "Everyday life" categories (these two categories need a closer look), about 31k; we're now at 53,496 articles. I think that's all the mass importing I plan to do for now; if anyone else needs help with large list import, feel free to let me know.
Levivich19:21, 28 August 2022 (UTC)reply
@
CactiStaccingCrane: Absolutely! The recent changes feed (the link on the wikiproject page) automatically draws from the /Articles subpage, so as soon as articles are added to that subpage, the RC feed is automatically updated. So right now the RC feed should be monitoring all 53k articles in /Articles.
Levivich16:08, 2 September 2022 (UTC)reply
Thoughts on these, or any other ideas? I think whatever we decide, it should be objective, to reduce arguments about inclusion.
Levivich20:05, 15 August 2022 (UTC)reply
Are we not including articles on the subject published in reputable peer-reviewed journals for this purpose? In other words, this is solely for topics covered in three publications, these being either books or entries in major general encyclopedias?
BD2412T20:19, 15 August 2022 (UTC)reply
I would allow monographs and encyclopedic entries to add up (e.g., one monograph and two entries would make it).
I would also specifically allow more specialized scholarly encyclopedias (e.g., Dictionary of National Biography, Encyclopaedia of Islam), though these must be of excellent academic reputation (perhaps 'scholarly encyclopedias of high repute'?).
Yes, we should also count papers in
academic journals or chapters in
edited volumes in some way: perhaps we can let three such papers or chapters count for one monograph/encyclopedic entry? That would include everything which has nine papers/chapters, even if there is no monograph/encyclopedic entry. Or is it better to require twelve, making four count for one? ☿
Apaugasma (
talk☉)20:31, 15 August 2022 (UTC)reply
If we're going to go that route, let's just have a points system. Ten points = inclusion. A monograph or encyclopedia entry counts for four points.
BD2412T20:39, 15 August 2022 (UTC)reply
What about just
WP:GNG, except the sources must be academic? Which would include academic journals, academic books, encyclopedias, etc. (
"Tier 1" sources), as long they're
WP:SIGCOV? Or, perhaps more than SIGCOV, they must be the topic of the source?
Levivich20:41, 15 August 2022 (UTC)reply
I was also thinking of a point system because it seems wrong to give as much weight to a journal article as to a monograph. But perhaps it's better to emulate WP:GNG, leaving it vague how many sources exactly are needed in the first place. We could just state that sources must be
WP:TIER1, though monographs, literature/systematic reviews or entries in scholarly encyclopedias should be given more weight than single peer-reviewed articles or chapters in edited volumes (the latter are still missing from WP:TIER1).
WP:SIGCOV says "it does not need to be the main topic of the source material". I would change that for our purposes into it must be the main topic, or one of the main topics, of the source material. On the one hand we need much more than a few paragraphs or pages, but there often is more than one main topic, and demanding it to be the main topic would both be too strict and create problems with establishing whether something truly is the main topic of a source or not. ☿
Apaugasma (
talk☉)21:17, 15 August 2022 (UTC)reply
(I would include "chapters in edited volumes" as part of TIER1's "books published by university presses", but maybe TIER1 should be clarified.)
The question I have is whether we really need to weigh academic sources against other academic sources for the purpose of determining whether a topic is in scope of this project (as opposed to, say, for determining whether something is
WP:DUE in an article). I feel like a topic either "makes the cut" or doesn't "make the cut".
Say, for example, we decide that a topic should be supported by at least three academic sources to be in-scope (with the topic being a main topic of the source material, but not necessarily the main topic or sole main topic). Does it matter to us if the three sources are three books, three journal articles, three encyclopedia entries, or one of each? Do we want to say that topics that are the subject of three journal articles are not in scope? I'm not sure what my answer is to that question.
For practical purposes, we could start strict and expand the scope later. After all, if we're talking about a scope that includes hundreds of thousands of articles (most likely), we'll need to start with a smaller batch anyway (when it comes to adding talk page banners and such).
Levivich18:08, 16 August 2022 (UTC)reply
Yes, start strict. That's why three journal articles, which in my experience covers a lot more than three monographs/encyclopedic entries, should probably not make it, at least not to start with. A simple statement that articles in academic journal contribute a little less to academic notability should suffice though. Note that if we differentiate there, chapters in
edited volumes should also be differentiated from whole books (btw, such chapters are, as a very general rule, of lower quality than papers in journals, as well as than monographs). That's my view, but I would like to hear from others. ☿
Apaugasma (
talk☉)19:28, 16 August 2022 (UTC)reply
How about significant coverage in independent academic sources, and then we define the terms:
OK, I guess that's workable for now. Which bring me to my next question, how to populate the list of articles in scope, which I'll start a separate thread for below.
Levivich15:01, 18 August 2022 (UTC)reply
@
Moxy: I hope you're still interested/available? :-) We have a preliminary list of ~17k mainspace pages (
Wikipedia:WikiProject Core Content/Articles) that I think would grow, maybe significantly (>100k), if this WikiProject moved forward, but a list of 17k seems like enough to start with for now.
What's the usual procedure for "I want to add banners to 17k talk pages"? Is there a bot that does this already? Do I need to create my own and apply for BRFA for this task?
Hi @
Levivich: I suppose this could just be done via
AutoWikiBrowser: Someone would obtain the list, append Talk: to each entry, and use the software to insert the banner. According to
WP:ASSISTED though, a consensus is first needed and a BRFA should be opened if there are any doubts. But if the WikiProject gets created, I would see that as a clear consensus for making those edits. 0x
Deadbeef16:26, 2 September 2022 (UTC)reply
Thanks,
Deadbeef! I've looked around and there are a number of bots that already have approved tasks for WikiProject banner tagging (
User:Legobot,
User:EarwigBot,
User:AnomieBOT,
User:Hazard-Bot,
User:KiranBOT). I'm thinking we'll probably want to fork one of those rather than asking to use them directly, due to the large number of pages and the likelihood of wanting to run multiple runs over time (we don't want to keep bugging a bot operator, nor take over their bot). And we probably want to establish the WikiProject before seeking BRFA: not sure tagging 50k+ pages is justified if there are only a few people interested?
Levivich17:50, 2 September 2022 (UTC)reply
@
Levivich and
0xDeadbeef: I just saw the ping to my bot's ac. A couple of years back, I tried to revive wikiproject organised crime, and wikiproject espionage. They are somewhat better than before. If the targeted articles for the wikiproject are going to be a lot, then it is better to have a some sort of consensus (even if informal) somewhere. I think village pump would be a good idea for that. There we can also get further suggestions. From a technical point of view regarding project banners - it is a little trickier than expected. AWB/AWB bots rely primarily on categories for that task. So the non-recursive (bottom of the category/the category with no sub-cat) should contain only the articles that are expected to be tagged. Please feel free to ping me if you have any questions — about the bot, or wikiprojects in general. I will participate if I find it interesting, and if the time permits. At the least, I will always be available for discussions :-) —usernamekiran
(talk)17:16, 5 September 2022 (UTC)reply
Creating the list of articles
How do we generate a list of articles within the scope of this WikiProject? My thoughts:
I'm not as sure about
WP:VA4 and
WP:VA5, because they cover non-academic topics (e.g. pop culture). For example, I'm not sure that
Sean Connery, a VA4 topic, is the subject of significant scholarly coverage (though undoubtedly he is the subject of significant non-scholarly coverage). Perhaps there are certain VA4 and VA5 categories that we can "automatically import", while not automatically importing other VA categories (like "People")? If we do this for VA5, it'll bring the article count into the tens of thousands. That'll be a good place to start (in terms of setting up an RC feed, etc.), but we'll also be duplicating VA at this "level".
Is every topic that is an entry in
Encyclopedia Britannica in-scope? I believe there are about a couple hundred thousand entries in Britannica. The 1911 and 1922 versions, though old, are available on WikiSource. With a bit of technical magic, we could generate a list of every entry in those two versions. No doubt that will not include modern (post-1922) topics, but it's a start. This would bring us to a six-figure article count, and include articles that are outside the scope of VA.
Alternatively, we could cross-reference Britannica with another encyclopedia or two, and try to come up with a list of entries that are in multiple encyclopedias. I'm not quite sure how to get a list of topics for another encyclopedia (including modern Britannica). Maybe a web crawler? I don't know.
I don't really trust our category system, but maybe taking all the articles that are linked one or two category levels below
Category:Main topic classifications? My thought is that would grab all the top-level or broad-topic articles... but I really don't trust our category system. For example, two levels down is
Category:Communication studies, which you'd think would hold top-level communications articles, but it also has
Making Chastity Sexy and
Pastel QAnon, which strike me as out-of-scope. So I'm not sure this one is a good approach.
Another thought I had is pulling from a list of academic journals. If there's an entire academic journal on a topic, that topic is very likely to be in-scope, right? (And I mean real journals, not the predatory ones.) This would probably be the same as a list of academic disciplines, and we already have a category for that (
Category:Academic disciplines), so maybe this approach won't get us far.
Just a quick note that I personally won't be working on this for the foreseeable future. Sincerely appreciate the efforts, and hope that others will step in. ☿
Apaugasma (
talk☉)15:52, 18 August 2022 (UTC)reply
Cacti, I heart your enthusiasm :-) I have been looking at places where we can pull together a list of articles. Because let's face it, if we are to do something that is not just duplicating VA, we're talking about way more than 50k articles, probably something like 250k-500k articles. Generating a list of 250k-500k topics is not possible manually; it'll have to be done by combining some other lists somehow. It's very hard, or at least I think it's very hard, to figure out a way to generate a list of 500k topics that have significant academic coverage.
Anyway, I've been looking at the 1911 Encyclopedia Britannica, which has a handy list of topics at WikiSource:
wikisource:1911 Encyclopædia Britannica/Classified List of Articles. I had thought, well, OK, it'll be an outdated list and won't have modern topics, like WWI and WWII, but it's a start, right?
Well, I'm not so sure. After looking at the list of 1911 EB topics, I see they include wonderful entries such as "
Quadroon" and "
Mandingo". That makes me rather uncomfortable about just importing that list and saying "it's academic" because it's in an encyclopedia. I naively forgot how ridiculously racist the Western world was 100 years ago (like even more than today, amazingly).
So that's 17k. There's also
Wikipedia:WikiProject Missing encyclopedic articles/Hot, a list of 72k encyclopedia topics, from multiple encyclopedias. Plus Vital Articles, that would bring the list over 100k. For VA, I was thinking all of
WP:VA3 and all of
WP:VA5except the "People" and "Everyday life" categories... I think the other categories are all academic in nature, but those two need some more careful scrutiny, maybe by sub-category. Any thoughts?
Levivich00:17, 27 August 2022 (UTC)reply
I just want to say that having the RC feed is really great. I already made my first
revert, by sheer coincidence on an article related to the very topic I'm writing about these days.
Yes, the list from the Missing encyclopedic article project looks okay (though a spot-check seemed to reveal a lot of articles that may not meet the WP:ANG), and yes, better leave out some subcategories from VA (
VA5 Cities, and to a lesser extent
VA5 Culture, also look problematic).
In general though I would model the inclusion process on the general editing process: anyone can boldly add an article, anyone can boldly remove it, and if two or more editors are at odds about it they are expected to discuss. When a sufficient amount of editors are aware of the process, articles will soon be added and removed all the time. We need a start to get it going, but it doesn't need to be 100% accurate. ☿
Apaugasma (
talk☉)01:28, 27 August 2022 (UTC)reply
Unfortunately for both these lists, editors removed links as they turned blue, so what's left is a tiny proportion of their starting counts. I am not going to import WP:List of encyclopedia topics at all, because it seems to have too many non-academic topics (e.g.
Acid Head,
Auchlochan,
Bass Brook). The Hot list looks solid, but I think the import count will sadly be <6k from the current revisions of the pages. I might later see about pulling more topics from the page histories, but there are dozens of sub-pages and each one has been filled and culled multiple times, so not a quick/easy thing. Alas, I no longer think we'll get to 100k quite so quickly and easily.
Levivich00:09, 28 August 2022 (UTC)reply
Done - current version of the Hotlist pages added, about another 5k articles, all VA3, and all VA5 except "People" and "Everyday life" categories (these two categories need a closer look), about 31k; we're now at 53,496 articles. I think that's all the mass importing I plan to do for now; if anyone else needs help with large list import, feel free to let me know.
Levivich19:21, 28 August 2022 (UTC)reply
@
CactiStaccingCrane: Absolutely! The recent changes feed (the link on the wikiproject page) automatically draws from the /Articles subpage, so as soon as articles are added to that subpage, the RC feed is automatically updated. So right now the RC feed should be monitoring all 53k articles in /Articles.
Levivich16:08, 2 September 2022 (UTC)reply