This essay discusses how to review an article for plagiarism. Plagiarism has come up before at Dispatches where the general features of plagiarism and avoiding plagiarism were discussed.
I reviewed South American dreadnought race for MILHIST A Class. To check copyvios first I determined how I should approach the issue
Firstly, in either technique, read the article history and look at the change logs. It is mostly one editor's work (unlikely to be adhoc plagiarism then), so then I look at the first version of the page.
The revision history shows the article grew at a steady pace in byte count for 14 days, then levelled off (obviously undergoing copyediting!). This is a good sign. Steady growth indicates normal authorial work. As does sudden spurts of growth, each spurt being the same. Sudden increases in size which are out of pattern can indicate things being lifted whole.
Finally, glancing over the final revision for review:
To conduct automated testing I used http://toolserver.org/~earwig/cgi-bin/copyvio.py and http://en.wikipedia.org/wiki/User:CorenSearchBot/manual on the article. Earwig showed clear. CorenBot showed clear. This is only the start, however, as close paraphrase and "do the sources support their conclusions" need to be checked using this method. As Full text reading proceeds to individual source checking, I'll deal with Full text reading next.
Full text reading is the process of closely reading the style and expression of a work, to look for jarring changes in style, very unusual verbs verbal clauses or adjectival constructions, material worded far poorer than average, material worded much better than average, and styles which appear to have a academic or journalistic (etc.) rather than encyclopaedic style. So I started at the top of SAdr. For example:
By this stage I've determined the editor's own prose style, and have read the rest of the article, not noticing any sudden stylistic changes. Thus I need to move to spot checking.
Spot checking relies on picking sources, footnotes, or sentences which are likely to be close paraphrase:
Now saying this doesn't mean that editors acting in such ways are plagiarising; but these are the signs which I have found when dealing with Humanities encyclopaedia articles. When I see these signs, I concentrate spot checking on sources and sentences which display this behaviour. If a source is particularly relied upon in this way I check every useage of that source.
Consider, "The United States' Fore River Shipbuilding Company tendered the lowest bid—in part due to the high availability of cheap steel—and was awarded the contract.[32]" Endnote 32: "Livermore, "Battleship Diplomacy," 39." Bibliography: "Livermore, Seward W. "Battleship Diplomacy in South America: 1905–1925." The Journal of Modern History 16: no. 1 (1944), 31–44. JSTOR 1870986. ISSN 0022-2801. OCLC 62219150."
I then repeat this for every citation of Livermore.
So I reported it in my review:
- Spotcheck for copyvio, plagiarism, close paraphrase and citations supporting facts: issues, close paraphrase
- Earwig copyvio: clear
- CorenBot: clear
- Sudden stylistic changes and random unique turns of phrase manually checked: clear
- Spotchecked as Livermore; issues
- Endnote [17b] Article: "The Argentine government made a last-ditch attempt to preclude an arms race by offering to purchase one of the Brazilian ships, but when this was rebuffed, they sent a naval delegation to Europe to solicit tenders from armament companies to build warships for Argentina." ; Source: "Argentina made a final effort to secure naval parity with Brazil by offering to purchase one of the dreadnoughts; when this proposal was rejected, an Argentine naval commission sailed for Europe to receive tenders for the construction of two dreadnoughts and a number of destroyers." ; phrase ordering is the same, "Argentina" "avoid" "ship buy" "refusal" "naval" "to Europe" "build" "plural vessels" ; this appears as close paraphrase: The main point of difference is that preclusion of an arms race is a fundamentally different meaning to security naval parity. However, in all other aspects the expression follows that of the original. How close is close? I'm particularly concerned by the similarity of clause "by offering to purchase one of the"
Now that I've found close paraphrase I need to overcome my sadness and run a detailed check of Livermore, thorough and suspicious.
Livermore was cited 20 times. On two occasions close paraphrase occurred. In both cases it was where a single sentence in the text displayed the same information that a single sentence in the source displayed. In both cases the verb clause remained identical. In both cases the order of presentation was sufficiently similar. This appears to be accidental close paraphrase, and not a matter of style or habit for the editor. The editor's work is fantastic, but they need to hede WP:Close paraphrase on internalisation and re-expression.
The first close paraphrase appeared in the initial article version, the second must have appeared later. This clearly indicates that these were natural slip ups and not an matter of fundamentally bad habits.
I then repeat this for two or three other sources. I chose to check the "weakest" sources, because looking through Livermore exhausted me; and, the style review passed clearly.
I saved my report, and let the author know that they need to watch when they write single article sentences from single source sentences.
It is also obvious that we need automated tools which identify verb clauses and extensively search Google Scholar. Fifelfoo ( talk) 02:32, 2 June 2011 (UTC)
It took me (while writing this up) 90 minutes to automatically, style read, search google / scholar for style turns of phrase, and close read Livermore and two web citations. I estimate that the cost of documenting the process was about 25-50% of the time. I estimate I spent 60 minutes reading Livermore closely. Thus, I'd estimate the cost of spotchecking a FACable article to be approximately 60 minutes.
This essay discusses how to review an article for plagiarism. Plagiarism has come up before at Dispatches where the general features of plagiarism and avoiding plagiarism were discussed.
I reviewed South American dreadnought race for MILHIST A Class. To check copyvios first I determined how I should approach the issue
Firstly, in either technique, read the article history and look at the change logs. It is mostly one editor's work (unlikely to be adhoc plagiarism then), so then I look at the first version of the page.
The revision history shows the article grew at a steady pace in byte count for 14 days, then levelled off (obviously undergoing copyediting!). This is a good sign. Steady growth indicates normal authorial work. As does sudden spurts of growth, each spurt being the same. Sudden increases in size which are out of pattern can indicate things being lifted whole.
Finally, glancing over the final revision for review:
To conduct automated testing I used http://toolserver.org/~earwig/cgi-bin/copyvio.py and http://en.wikipedia.org/wiki/User:CorenSearchBot/manual on the article. Earwig showed clear. CorenBot showed clear. This is only the start, however, as close paraphrase and "do the sources support their conclusions" need to be checked using this method. As Full text reading proceeds to individual source checking, I'll deal with Full text reading next.
Full text reading is the process of closely reading the style and expression of a work, to look for jarring changes in style, very unusual verbs verbal clauses or adjectival constructions, material worded far poorer than average, material worded much better than average, and styles which appear to have a academic or journalistic (etc.) rather than encyclopaedic style. So I started at the top of SAdr. For example:
By this stage I've determined the editor's own prose style, and have read the rest of the article, not noticing any sudden stylistic changes. Thus I need to move to spot checking.
Spot checking relies on picking sources, footnotes, or sentences which are likely to be close paraphrase:
Now saying this doesn't mean that editors acting in such ways are plagiarising; but these are the signs which I have found when dealing with Humanities encyclopaedia articles. When I see these signs, I concentrate spot checking on sources and sentences which display this behaviour. If a source is particularly relied upon in this way I check every useage of that source.
Consider, "The United States' Fore River Shipbuilding Company tendered the lowest bid—in part due to the high availability of cheap steel—and was awarded the contract.[32]" Endnote 32: "Livermore, "Battleship Diplomacy," 39." Bibliography: "Livermore, Seward W. "Battleship Diplomacy in South America: 1905–1925." The Journal of Modern History 16: no. 1 (1944), 31–44. JSTOR 1870986. ISSN 0022-2801. OCLC 62219150."
I then repeat this for every citation of Livermore.
So I reported it in my review:
- Spotcheck for copyvio, plagiarism, close paraphrase and citations supporting facts: issues, close paraphrase
- Earwig copyvio: clear
- CorenBot: clear
- Sudden stylistic changes and random unique turns of phrase manually checked: clear
- Spotchecked as Livermore; issues
- Endnote [17b] Article: "The Argentine government made a last-ditch attempt to preclude an arms race by offering to purchase one of the Brazilian ships, but when this was rebuffed, they sent a naval delegation to Europe to solicit tenders from armament companies to build warships for Argentina." ; Source: "Argentina made a final effort to secure naval parity with Brazil by offering to purchase one of the dreadnoughts; when this proposal was rejected, an Argentine naval commission sailed for Europe to receive tenders for the construction of two dreadnoughts and a number of destroyers." ; phrase ordering is the same, "Argentina" "avoid" "ship buy" "refusal" "naval" "to Europe" "build" "plural vessels" ; this appears as close paraphrase: The main point of difference is that preclusion of an arms race is a fundamentally different meaning to security naval parity. However, in all other aspects the expression follows that of the original. How close is close? I'm particularly concerned by the similarity of clause "by offering to purchase one of the"
Now that I've found close paraphrase I need to overcome my sadness and run a detailed check of Livermore, thorough and suspicious.
Livermore was cited 20 times. On two occasions close paraphrase occurred. In both cases it was where a single sentence in the text displayed the same information that a single sentence in the source displayed. In both cases the verb clause remained identical. In both cases the order of presentation was sufficiently similar. This appears to be accidental close paraphrase, and not a matter of style or habit for the editor. The editor's work is fantastic, but they need to hede WP:Close paraphrase on internalisation and re-expression.
The first close paraphrase appeared in the initial article version, the second must have appeared later. This clearly indicates that these were natural slip ups and not an matter of fundamentally bad habits.
I then repeat this for two or three other sources. I chose to check the "weakest" sources, because looking through Livermore exhausted me; and, the style review passed clearly.
I saved my report, and let the author know that they need to watch when they write single article sentences from single source sentences.
It is also obvious that we need automated tools which identify verb clauses and extensively search Google Scholar. Fifelfoo ( talk) 02:32, 2 June 2011 (UTC)
It took me (while writing this up) 90 minutes to automatically, style read, search google / scholar for style turns of phrase, and close read Livermore and two web citations. I estimate that the cost of documenting the process was about 25-50% of the time. I estimate I spent 60 minutes reading Livermore closely. Thus, I'd estimate the cost of spotchecking a FACable article to be approximately 60 minutes.