![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 3 | Archive 4 | Archive 5 | Archive 6 | Archive 7 | → | Archive 10 |
The Text available under a free license section seems to be redundant, unclear and possibly even incorrect/out-of-date. Are editors already aware of this, or discussing this somewhere else on this page? If not, I can list my concerns here. Abecedare ( talk) 00:26, 17 June 2009 (UTC)
←I have updated. I'll leave any redundancy issues for further discussion. :) -- Moonriddengirl (talk) 12:31, 17 June 2009 (UTC)
I have just removed " which means that it is imperative that their work is distinguishable from the prose of the Wikipedia article". I have no idea what this means, but it appears to imply that we could not take CC-BY_SA text from another source and incorporate it into a WP article. That implication is incorrect. — Carl ( CBM · talk) 16:29, 17 June 2009 (UTC)
Kaldari has now removed the entire "Because articles normally evolve..."sentence with the edit summary You do not "need" to retain an anchor to the original text (attribution will suffice), plus it is not always possible or helpful to link to the original text. [1] Perhaps the entire "Text available under a free license" section needs to be rolled up into the "Public domain or free license text" section above (this is perhaps the result of new editors coming along and just writing in their own sections to express their own ideas). I have some concerns with the removal though:
I have no problem with any of the above, if that is what the community wishes. Franamax ( talk) 23:37, 17 June 2009 (UTC)
The sentence "Because articles normally evolve through incremental changes, it is important to retain an anchor to the originally copied text so that subsequent changes can be traced" needs to be deleted or rewritten. Attribution is important, but specifying that it must be through an anchor link is absurd. There are plenty of physical books, zines, journals, and other media that are under free licenses that you cannot create "anchor links" to. Believe it or not, not all media is on the web. Since we already discuss attribution ad nauseum, I think the sentence should just be deleted. Kaldari ( talk) 16:01, 18 June 2009 (UTC)
Is anyone attached to this intro?
I think it should go. "Thinking of it and writing it all on your own" are not an option for Wikipedians (at least not one that we want to encourage). I think we should not even go there, evoking the university setting in this manner.
We should just tell editors what an appropriate use of a source is, and what isn't. We could just keep the sentiment of the last sentence:
The following two paragraphs are, again, too "academic" for my liking:
Basically, we seem to be telling editors, "No one really knows what plagiarism is, but we think it is a big problem". We should just stick to what we want editors to do (or not to do), so the guideline gives them a sense of confidence and certainty ("Right ... I see ... this is what I have to do ..."), rather than a sense of uncertainty and doubt ("Gee, this plagiarism thing is kind of involved ... what does it all mean? Does it even matter? People aren't even agreed it is plagiarism. I'm not trying to deceive anyone, I just want to tell people what the sources I've cited say!").
Views? JN 466 23:40, 17 June 2009 (UTC)
I strongly agree with Jayen466's sentiments. The definitions are worse than useless. Not only are they largely inapplicable to Wikipedia (since in some cases it is actually appropriate to copy content into Wikipedia), our analysis of them leads nowhere and basically says "we don't actually know how to define plagiarism in the context of Wikipedia." Just ax the whole section and keep to giving editors specific instructions, not meandering inquiries into the nature of plagiarism. Kaldari ( talk) 16:11, 18 June 2009 (UTC)
I think the present text is OK. There is no agreement in the real world about what "plagiarism" means; we cannot resolve that. The present text gives a summary of the situation in practical terms relevant to Wikipedia. — Carl ( CBM · talk) 00:25, 19 June 2009 (UTC)
I like the new text, Jayen466. It is unambiguous, to the point, and isn't totally confusing and meandering like the old version. Kaldari ( talk) 16:12, 19 June 2009 (UTC)
To illustrate some of my concerns, the following examples are from the text and sources of 2007 Samjhauta Express bombings, a Featured Article.
Source | Article |
---|---|
Witnesses said they saw people screaming and struggling to get out. The injured were pulled out of the burning carriages onto the trackside by fellow passengers, and local residents rushed to help. | Witnesses claim to have seen passengers screaming and attempting to escape […] The injured were pulled out of the burning carriages and onto the track by fellow passengers and local residents. |
Indian Prime Minister Manmohan Singh, expressing "anguish and grief" at the loss of life, vowed that the culprits would be caught. | Prime Minister Manmohan Singh expressed "anguish and grief" at the loss of life, and vowed that the culprits would be caught. |
Musharraf called for a full investigation by the Indian authorities | Musharraf also said that there must be a full Indian investigation of the attack. |
Inside one, an electronic timer encased in clear plastic was packed next to more than a dozen plastic bottles containing a cocktail of fuel oils and chemicals. | Inside one of the suitcases containing the undetonated IEDs, a digital timer encased in transparent plastic was packed alongside a dozen plastic bottles containing fuel oils and chemicals. |
Officials said about 30 of the bodies were charred beyond recognition. | … many of the bodies were charred beyond recognition |
The rest of the train, which had been carrying around 600 passengers, continued to the border town of Attari where passengers were transferred to a Pakistani train. | The rest of the train, which was left undamaged by the attack, continued on to the border town of Attari, before being transferred to a Pakistani train that took passengers to their destination in Lahore |
The letter of our guideline would lay the authors of this FA open to the charge of plagiarism. However, I don't think it would be fair, especially when reading the article as a whole. I do not think that a phrase like "bodies were charred beyond recognition" should be rephrased as "the dead were so badly burnt that they could not be identified" (which actually might be wrong; perhaps they were identified using DNA analysis, etc.), nor do I think that a six-word phrase like "the bodies were charred beyond recognition" should be put in quotation marks. These are all non-creative, factual expressions, remarkable more for the information they convey than for the formulations used to convey it.
Note that in this case, there is likely no great POV dispute in the article that would set editors against each other. But imagine this being an article on Eastern European history, or homeopathy. I can just see editors saying, "What you have just inserted about the "bodies being charred beyond recognition" is plagiarism. I've deleted it." How can we make the difference clear between the acceptable use of the most straightforward way of saying something and this sort of thing, which is egregious plagiarism?
Note also that in the last source/article pair, the article writing is inferior to the source (the "rest of the train" was not "transferred to a Pakistani train", the surviving passengers were). In the first pairing, the article is inaccurate – "onto the trackside" is not the same as "onto the track". JN 466 08:27, 18 June 2009 (UTC)
Sources and text are from the FA 2000 Sri Lanka cyclone.
Source | Article |
---|---|
At least nine people are dead | At least nine people died |
Eight fishermen are missing, feared dead. | Eight people were left missing and feared dead. |
a street protest took place in Trincomalee on December 27 over the lack of aid | A street protest occurred in Trincomalee due to lack of aid. |
The families of those who died will receive 15,000 rupees ($US183) in compensation and those whose homes have been damaged or destroyed will receive just 10,000 rupees. | the families of those who died received $183 [...] in compensation. The government also gave $122 [...] to those whose houses were damaged or destroyed |
Ten roofing sheets were distributed to 1,720 families in six Districts ... In addition, 3,000 families were selected to receive one set of cooking utensils each, two bedsheets and two sleeping mats. | the Red Cross distributed 10 roofing sheets each to 1,720 families, and also sent a set of cooking utensils, bed sheets, and sleeping mats to 3,000 families. |
Is any of this plagiarism? For example, would it make sense to change "set of cooking utensils", "bed sheets" or "sleeping mats" to synonyms, to avoid substantial similarity with the source? Is the re-use of these words in this FA indicative of laziness, an intent to deceive, or a desire for precision? Should "set of cooking utensils", "bed sheets" and "sleeping mats" be put in quotation marks in the article? JN 466 11:26, 18 June 2009 (UTC)
Extended content
|
---|
If these two are to be the first of many specific examples, then, may I suggest that a subpage would be appropriate with a pointer from here? While it's probably a good idea to centralize conversation about the principles, this is but one point of this guideline, and lengthy tables and conversations about specific examples may overwhelm and distract from developing other points. As to the general, it seems revisiting the legal aspects of this might be useful. There are two factors to consider here. Close paraphrasing of free sources is a plagiarism concern. Whether or not it's allowed is up to consensus. Close paraphrasing of non-free sources is a copyright concern. Whether or not it's allowed is down to policy based on US law. Some of the examples you give above are "fragmented literal similarity", which is what it is called when literal duplication occurs, but copying is not comprehensive. Close paraphrasing may occur even in the absence of such fragments, if the structure of a source is copied but the language completely changed. Yes, you can violate US copyright law and be legally sanctioned without using a single word from your source if you rise to the level of "comprehensive non-literal similarity". The incorporation of literal similarity in such cases simply serves to strengthen the evidence against you, since it is pretty hard to defend against a charge of copying when evidence is clear that you have read the source and copied it. From a copyright standpoint, the dividing line between how much is too much (when we reach the point that a court says, "This is serious enough for us to care") is not firmly defined by legal code. We don't take chances...not only for our own use, but for that of our contributors. To refer back to WP:C, "If in doubt, write the content yourself, thereby creating a new copyrighted work which can be included in Wikipedia without trouble." (Close paraphrasing is derivative work, which is allowed only by the copyright holder.) There are a good many cases illustrating these copyright issues in action, but I'm going to quote a bit from Salinger v. Random House (we really need an article on that), since it seems particularly relevant to some of your points above. In that case, the court characterized the problem succinctly, noting that facts are not copyrighted but that "'vividness of description' is precisely an attribute of the author's expression that he is entitled to protect....The copier is not at liberty to avoid 'pedestrian' reportage by appropriating his subject's literary devices." The court also noted that "Though a cliche or an 'ordinary' word-combination by itself will frequently fail to demonstrate even the minimum level of creativity necessary for copyright protection..., such protection is available for the 'association, presentation, and combination of the ideas and thought which go to make up the [author's] literary composition.'...as we have more recently stated, 'What is protected is the manner of expression, the author's analysis or interpretation of events, the way he structures his material and marshals facts, his choice of words and the emphasis he gives to particular developments.'...The 'ordinary' phrase may enjoy no protection as such, but its use in a sequence of expressive words does not cause the entire passage to lose protection. And though the "ordinary" phrase may be quoted without fear of infringement, a copier may not quote or paraphrase the sequence of creative expression that includes such a phrase." (citations omitted; http://www.law.cornell.edu/copyright/cases/811_F2d_90.htm) Again, determining when such has risen to a legally actionable level is very complex. Courts consider many factors in determining if the fragmented similarity meets "fair use." Wikipedia has deliberately chosen to follow a more strict standard than fair use in order to make our content as reusable as possible. Hence, this guideline is not going to make close paraphrasing of copyright protected materials suddenly problematic, because close paraphrasing of copyright protected materials is already problematic. If you wish to pursue refining the application of the concept of close paraphrasing to free materials, please be sure to separate that out from non-free materials. If you do wish to consider its application to copyright protected materials, please remember that close paraphrasing reflects far more than occasional duplication of language. It also refers to lifting the structure of the material and the perspective and emphasis to facts/details/events. -- Moonriddengirl (talk) 13:12, 18 June 2009 (UTC)
|
Extended content
|
---|
← WP:C says, "There are some circumstances under which copyrighted works may be legally utilized without permission; see Wikipedia:Non-free content for specific details on when and how to utilize such material." I've already quoted what NFC says. Multiple policies and guidelines note that Wikipedia has deliberately chosen a more narrow road than "fair use." Verbatim quotations are permitted by policy. If you're suggesting that proper citation of close paraphrase makes a "lesser" taking, then I believe you may be misunderstanding the courts' position, but I'd be interested in seeing support for that. The operative words in your quote are "too-close paraphrasing." Wikipedia:Close paraphrasing is incorporated by reference in the "see also" section at the bottom. It does address situations where little room for originality in language exists. -- Moonriddengirl (talk) 22:23, 18 June 2009 (UTC)
|
The lead should sum up what is said in the body of the page and be stand alone. As such I have a problem with this second paragraph:
Plagiarism is the incorporation of someone else's work without providing adequate credit. Even if you have cited a source, make sure that your wording does not duplicate that of the source unless you note duplication by quotation marks or some other acceptable method (such as block quotations).[1] This applies even if your source is not copyrighted.
I think it confuses plagiarism with copyright. It is not just the wording that is plagiarism, but also claiming that someone else's idea is one's own (a big sin in academia).
However I am not so hung up of the first sentence. Where I have a problem is the second and third sentence as they imply that copying 1911EB is unacceptable although many many articles include 1911EB text, and other copyright expired sources. If all that text was to be placed in quotations then it could not be edited to update the style and information content within the quotes. The whole point of including chunks from 1911EB and similar is to put in place a seedbed of information on a topic that through the usual Wikipedia process can gradually be altered into a completely new and useful work, with some parts trimmed out and others added and the sentences altered so that they read as a contemporary work. Given 20 years or so the text will probably look nothing like the initial 1911EB text, but if it is in quotes this can never happen. If of course the original author of a copyright expired text makes a statement that is a point of view about something there is no reason why that specific point of view should not be included as a quote or otherwise attributed to the source -- in the usual way that is done for such text under copyright -- but there is no reason why the text in general should not be incorporated into the Wikipedia article. If text is copied verbatim from a PD source I think it is a good idea to include one of the templates in Category:Attribution templates, but I do not think it is essential.
So I think this second paragraph needs to be broken into two and expanded, so that there is an explanation of how text that is in copyright should be presented within an article and another paragraph on what to do if the text is copyright expired. This would roughly speaking cover the differences mentioned in detail in the sections of the page. -- PBS ( talk) 13:59, 18 June 2009 (UTC)
Re Doncram (07:12, 19 June 2009): We have been importing free content for years; see Category:Attribution templates. EB1911 was just one largish project, there are others as well. So any preliminary discussion of this would have happened very early in the project's history. But it's possible that other people, like me, simply view this as a natural part of the free content philosophy.
I was hoping that my post here might spark some discussion. I want to ensure that our attribution system meets CZ's requirements, and gives credit to all authors of an article, whether they are WP editors or not. And I am very willing to discuss how to do that. But I do not want to change our current practice, which would allow us, for example, to take a CZ article on a topic we do not have and copy that entire article to WP without changing a single character, apart from adding attribution that our text came from CZ. — Carl ( CBM · talk) 13:14, 19 June 2009 (UTC)
Re Philip Baird Shearer (13:59, 18 June 2009): I agree that the second paragraph is worded in a way that contradicts our usual practices for incorporating free content, and should be changed. — Carl ( CBM · talk) 13:29, 19 June 2009 (UTC)
Since we are talking about the lede section, the text just needs to be a faithful summary of the content lower on the page. I split the troublesome sentence into two, and made it refer directly to the full discussion lower down. — Carl ( CBM · talk) 13:37, 19 June 2009 (UTC)
In my opinion, the proposed prohibition against direct copying is completely inappropriate. Plagiarism is use of material without citing the source, period. If you cite the source, then copying is not plagiarism.
In a academic setting, Things are very different. When a student submits a paper, There is a presumption that the wording itself, and not just facts and ideas, are the work of the student whose name appears on the paper as the author. In general, the entire object of the paper is to demonstrate to the teacher that the student can do original work: This is the presumptive meaning of the student's name on the paper.
The situation at Wikipedia bears no relationship whatsoever to the student paper. There is no author's name on the Wikipedia article, and the reader has no a priori expectation of authorship. Our standards are explicitly at variance with those of a student paper: we consider original research, synthesis, and even creativity to be detrimental.
A casual Wikipedia reader is looking for information and has no expectation of authorship at all. We serve this reader best by providing the best possible encyclopedic content we can. A more serious reader will look deeper, and will want to know more about the sources, but a serious reader will almost immediately become aware that Wikipedia is a collaborative work and that any wording may have come from any of a huge collection of editors or other sources. An even more serious reader will quickly learn how to find out exactly where any wording in the article came from. The goal of our plagiarism policy should be to ensure that someone who wants to know where the wording came from will be able to do so.
So much for our obligation to the reader. But what about our obligation to the original author? This comes in two parts: legal obligation, and moral obligation. Our legal obligations are embodied in copyright law and are beyond the scope of the plagiarism guideline. Our moral obligation to the author is to provide acknowledgment and recognition. But we have a moral obligation to the reader, to provide an easily readable article, and the original author has a moral obligation to us, as members of civilization. We must balance these obligations. For me, I believe that we have no more (or less) moral obligation to an author than we have to any Wikipedia editor, but Since that author is known, we can an should cite the original in the edit summary.
In many cases, (e.g., the DNB) useful encyclopedic articles in the public domain were written by authors as work for hire, and were at most minimally attributed in the originals. For such sources, in my opinion we have no moral obligation to the publisher whatsoever: the publisher is a soulless legal entity, not an individual. The individual author already conveyed the copyright to the publisher, and I feel no particular obligation to the author, either. In (most) other cases the author of a work which is no longer in copyright has no expectation of further ownership of the work in the moral sense. We, as that author's cultural heirs, have a right to use the work as we see fit. This was the norm for more than a thousand years, and only changed when the printing press suddenly added monetary value to the ability to control the copying of a work. That is, it's all about money, not "moral rights" at all.
Please, do not artificially restrict our ability to use our cultural heritage based on some mistaken analogy with grading student papers. - Arch dude ( talk) 19:53, 19 June 2009 (UTC)
<-- As can be seen in the history, I rewrote the introduction, with a help from Moonriddengirl and I hope it reflects what the majority of editors consider acceptable. However as with all edit I expect it will be edited unmercifully. From what you wrote above Arch dude I think you are basically happy with the alteration but you are not comfortable with the paragraph that starts "Some external works that are copyright expired..." because it does not say that an editor MUST include attribution. I think that the way to deal with this is rather than adding it to the lead, (as it does not have to be done for legal reasons) it should be described in a section called "copyright expired" that if text is copied from a copyright expired source and no attribution has been added, rather than delete the text editors are encouraged to add attribution. That brings me to the next point, I think that the subsections "Attributing text copied from other sources" should be rewritten to reflect the different types of sources that I have touched upon in the introduction. -- PBS ( talk) 10:34, 23 June 2009 (UTC)
Am I the only person that find that this section is ludicrous? Allow me to quote the lead (attribution at the project page history section):
Wikipedia draws clear distinctions between work submitted by Wikipedia editors as their own work (which can be "edited mercilessly"), work marked as a quotation (which must be properly credited and left essentially untouched), work described as a paraphrase of another source (which can be edited as long as the original sense is not lost), and direct copying of large blocks of free content written by other people (which should also be credited). In quotations, editorial notes and minor changes are sometimes useful, but must be clearly marked as such. See WP:MOSQUOTE for details.
The section then goes on to to discuss, erm, none of the above cases. What follows instead is a discussion of the copyright status of various sources, and not a very good one at that.
Wikipedia policy requires the citation of sources. It also allows the restricted use of direct quotations under the Non-free content criteria, one of which is that the source is correctly attributed (another is that the source has been previously published outside of Wikipedia). Now this use of properly attributed textual quotations is not controversial: allow me to quote article 10 of the Berne Convention:
- It shall be permissible to make quotations from a work which has already been lawfully made available to the public, provided that their making is compatible with fair practice, and their extent does not exceed that justified by the purpose, including quotations from newspaper articles and periodicals in the form of press summaries.
- It shall be a matter for legislation in the countries of the Union, and for special agreements existing or to be concluded between them, to permit the utilization, to the extent justified by the purpose, of literary or artistic works by way of illustration in publications, broadcasts or sound or visual recordings for teaching, provided such utilization is compatible with fair practice.
- Where use is made of works in accordance with the preceding paragraphs of this Article, mention shall be made of the source, and of the name of the author if it appears thereon.
U.S. fair use law is very similar with regards to text, although many of the key cases date from before the accession of the U.S.A. to the Berne Convention.
I am well known for saying that this whole exercise is a waste of everybody's time: it is more so if you don't look at the copyright restrictions (and hence the existing WP policies) which also cover this area. Physchim62 (talk) 13:32, 22 June 2009 (UTC)
I suggest that this section is restructured so we have sections on the different types of text that can be copied into an article. It is based on the structure in the new introduction and obviously there is a lot of text in the current "Attributing text copied from other sources" which should also be included, but it seperates out the different way that text is used
suggested structure:
- Attributing text copied from other sources
A mention here that if an editor wishes to incorporate text from another source and is not sure which category the text fall into then they should ask on the talk page of the Wikipedia article, or ask at Wikipedia:Reliable sources/Noticeboard before copying any text into a Wikipedia article. As it is better to sort this out before a mess is created rather than afterwards.
- Sources under copyright
- Sources under copyleft
- Public domain sources
This is more complicated than copyright expired because I think there must be a clear statement of how the text happens to be in the public domain, and if there is a doubt then any text's status then it should be treated as under copyright.
- Copyright expired sources
- Compliance with the content policies
This might be better incorporated into the fist section after the when in doubt paragraph.
-- PBS ( talk) 11:02, 23 June 2009 (UTC)
In this edit, the statement that "When copying material within Wikipedia, from one article to another, attribution is also required" has been marked with the {{ dubious}} template, asking for discussion. Here is the discussion:
Material contributed to en:wiki is done under the GFDL and CC-BY-SA licenses. Both unambiguously require that attribution to the original author(s) is mandatory. Is the dubious part about the case where someone copies only a quotation or external free text here and there? In that case, attribution is still required, but to the original author(s), not the wiki editor who copied it in. In any other case, GFDL requires a History page to indicate the Authors of the new Document. If the contributing editor is copying from another en:wiki page, they are not the Author of that text, rather they are incorporating the work of previous Authors, whose contribution must be acknowledged. Franamax ( talk) 01:17, 26 June 2009 (UTC)
(unindent)I don't think there is any real controversy, it has always been common practice to assume that not preserving the history in one way or another does violate the GFDL. It does get fuzzy because wikipedia never followed the GFDL to the letter, choosing to ignore some sections. But it never ignored the attribution requirements. Note that all work here was able to be dual licensed, even old work, because Wikipedia is such a large user of GFDL that they convinced FSF to designate the CC-SA license as the new version of the GFDL, in a tricky move that was somewhat controversial. Gigs ( talk) 01:45, 30 June 2009 (UTC)
I clarified the sentence in question. Regardless of any arguments about what the licenses actually say, there is a long-established practice here that we do not require in-article attribution when we copy text from one Wikipedia article to another, just an edit summary (see WP:MERGE). The wisdom of that policy can be debated, but it is not an issue that can be resolved on this page. I'd suggest the village pump. — Carl ( CBM · talk) 03:00, 30 June 2009 (UTC)
- Importing text:
- If you want to import text that you have found elsewhere or that you have co-authored with others, you can only do so if it is available under terms that are compatible with the CC-BY-SA license. You do not need to ensure or guarantee that the imported text is available under the GNU Free Documentation License. Furthermore, please note that you cannot import information which is available only under the GFDL. In other words, you may only import text that is (a) single-licensed under terms compatible with the CC-BY-SA license or (b) dual-licensed with the GFDL and another license with terms compatible with the CC-BY-SA license
- If you import text under a compatible license which requires attribution, you must, in a reasonable fashion, credit the author(s). Where such credit is commonly given through page histories (such as Wikimedia-internal copying), it is sufficient to give attribution in the edit summary, which is recorded in the page history, when importing the text. Regardless of the license, the text you import may be rejected if the required attribution is deemed too intrusive.
-- PBS ( talk) 08:50, 30 June 2009 (UTC)
Granted, today is a hot day here, but even on a cooler day I think I would struggle to get all the way to the end of this lead. It is too dense, presents too much detail too soon. JN 466 16:36, 30 June 2009 (UTC)
This is not clear - when editor copies & pastes large chunks of text but attributes them (with inline ref or such) are we dealing with plagiarism or copyvio (or both)? Which policy (policies) should be cited in a warning? -- Piotr Konieczny aka Prokonsul Piotrus| talk 06:44, 14 July 2009 (UTC)
![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 3 | Archive 4 | Archive 5 | Archive 6 | Archive 7 | → | Archive 10 |
The Text available under a free license section seems to be redundant, unclear and possibly even incorrect/out-of-date. Are editors already aware of this, or discussing this somewhere else on this page? If not, I can list my concerns here. Abecedare ( talk) 00:26, 17 June 2009 (UTC)
←I have updated. I'll leave any redundancy issues for further discussion. :) -- Moonriddengirl (talk) 12:31, 17 June 2009 (UTC)
I have just removed " which means that it is imperative that their work is distinguishable from the prose of the Wikipedia article". I have no idea what this means, but it appears to imply that we could not take CC-BY_SA text from another source and incorporate it into a WP article. That implication is incorrect. — Carl ( CBM · talk) 16:29, 17 June 2009 (UTC)
Kaldari has now removed the entire "Because articles normally evolve..."sentence with the edit summary You do not "need" to retain an anchor to the original text (attribution will suffice), plus it is not always possible or helpful to link to the original text. [1] Perhaps the entire "Text available under a free license" section needs to be rolled up into the "Public domain or free license text" section above (this is perhaps the result of new editors coming along and just writing in their own sections to express their own ideas). I have some concerns with the removal though:
I have no problem with any of the above, if that is what the community wishes. Franamax ( talk) 23:37, 17 June 2009 (UTC)
The sentence "Because articles normally evolve through incremental changes, it is important to retain an anchor to the originally copied text so that subsequent changes can be traced" needs to be deleted or rewritten. Attribution is important, but specifying that it must be through an anchor link is absurd. There are plenty of physical books, zines, journals, and other media that are under free licenses that you cannot create "anchor links" to. Believe it or not, not all media is on the web. Since we already discuss attribution ad nauseum, I think the sentence should just be deleted. Kaldari ( talk) 16:01, 18 June 2009 (UTC)
Is anyone attached to this intro?
I think it should go. "Thinking of it and writing it all on your own" are not an option for Wikipedians (at least not one that we want to encourage). I think we should not even go there, evoking the university setting in this manner.
We should just tell editors what an appropriate use of a source is, and what isn't. We could just keep the sentiment of the last sentence:
The following two paragraphs are, again, too "academic" for my liking:
Basically, we seem to be telling editors, "No one really knows what plagiarism is, but we think it is a big problem". We should just stick to what we want editors to do (or not to do), so the guideline gives them a sense of confidence and certainty ("Right ... I see ... this is what I have to do ..."), rather than a sense of uncertainty and doubt ("Gee, this plagiarism thing is kind of involved ... what does it all mean? Does it even matter? People aren't even agreed it is plagiarism. I'm not trying to deceive anyone, I just want to tell people what the sources I've cited say!").
Views? JN 466 23:40, 17 June 2009 (UTC)
I strongly agree with Jayen466's sentiments. The definitions are worse than useless. Not only are they largely inapplicable to Wikipedia (since in some cases it is actually appropriate to copy content into Wikipedia), our analysis of them leads nowhere and basically says "we don't actually know how to define plagiarism in the context of Wikipedia." Just ax the whole section and keep to giving editors specific instructions, not meandering inquiries into the nature of plagiarism. Kaldari ( talk) 16:11, 18 June 2009 (UTC)
I think the present text is OK. There is no agreement in the real world about what "plagiarism" means; we cannot resolve that. The present text gives a summary of the situation in practical terms relevant to Wikipedia. — Carl ( CBM · talk) 00:25, 19 June 2009 (UTC)
I like the new text, Jayen466. It is unambiguous, to the point, and isn't totally confusing and meandering like the old version. Kaldari ( talk) 16:12, 19 June 2009 (UTC)
To illustrate some of my concerns, the following examples are from the text and sources of 2007 Samjhauta Express bombings, a Featured Article.
Source | Article |
---|---|
Witnesses said they saw people screaming and struggling to get out. The injured were pulled out of the burning carriages onto the trackside by fellow passengers, and local residents rushed to help. | Witnesses claim to have seen passengers screaming and attempting to escape […] The injured were pulled out of the burning carriages and onto the track by fellow passengers and local residents. |
Indian Prime Minister Manmohan Singh, expressing "anguish and grief" at the loss of life, vowed that the culprits would be caught. | Prime Minister Manmohan Singh expressed "anguish and grief" at the loss of life, and vowed that the culprits would be caught. |
Musharraf called for a full investigation by the Indian authorities | Musharraf also said that there must be a full Indian investigation of the attack. |
Inside one, an electronic timer encased in clear plastic was packed next to more than a dozen plastic bottles containing a cocktail of fuel oils and chemicals. | Inside one of the suitcases containing the undetonated IEDs, a digital timer encased in transparent plastic was packed alongside a dozen plastic bottles containing fuel oils and chemicals. |
Officials said about 30 of the bodies were charred beyond recognition. | … many of the bodies were charred beyond recognition |
The rest of the train, which had been carrying around 600 passengers, continued to the border town of Attari where passengers were transferred to a Pakistani train. | The rest of the train, which was left undamaged by the attack, continued on to the border town of Attari, before being transferred to a Pakistani train that took passengers to their destination in Lahore |
The letter of our guideline would lay the authors of this FA open to the charge of plagiarism. However, I don't think it would be fair, especially when reading the article as a whole. I do not think that a phrase like "bodies were charred beyond recognition" should be rephrased as "the dead were so badly burnt that they could not be identified" (which actually might be wrong; perhaps they were identified using DNA analysis, etc.), nor do I think that a six-word phrase like "the bodies were charred beyond recognition" should be put in quotation marks. These are all non-creative, factual expressions, remarkable more for the information they convey than for the formulations used to convey it.
Note that in this case, there is likely no great POV dispute in the article that would set editors against each other. But imagine this being an article on Eastern European history, or homeopathy. I can just see editors saying, "What you have just inserted about the "bodies being charred beyond recognition" is plagiarism. I've deleted it." How can we make the difference clear between the acceptable use of the most straightforward way of saying something and this sort of thing, which is egregious plagiarism?
Note also that in the last source/article pair, the article writing is inferior to the source (the "rest of the train" was not "transferred to a Pakistani train", the surviving passengers were). In the first pairing, the article is inaccurate – "onto the trackside" is not the same as "onto the track". JN 466 08:27, 18 June 2009 (UTC)
Sources and text are from the FA 2000 Sri Lanka cyclone.
Source | Article |
---|---|
At least nine people are dead | At least nine people died |
Eight fishermen are missing, feared dead. | Eight people were left missing and feared dead. |
a street protest took place in Trincomalee on December 27 over the lack of aid | A street protest occurred in Trincomalee due to lack of aid. |
The families of those who died will receive 15,000 rupees ($US183) in compensation and those whose homes have been damaged or destroyed will receive just 10,000 rupees. | the families of those who died received $183 [...] in compensation. The government also gave $122 [...] to those whose houses were damaged or destroyed |
Ten roofing sheets were distributed to 1,720 families in six Districts ... In addition, 3,000 families were selected to receive one set of cooking utensils each, two bedsheets and two sleeping mats. | the Red Cross distributed 10 roofing sheets each to 1,720 families, and also sent a set of cooking utensils, bed sheets, and sleeping mats to 3,000 families. |
Is any of this plagiarism? For example, would it make sense to change "set of cooking utensils", "bed sheets" or "sleeping mats" to synonyms, to avoid substantial similarity with the source? Is the re-use of these words in this FA indicative of laziness, an intent to deceive, or a desire for precision? Should "set of cooking utensils", "bed sheets" and "sleeping mats" be put in quotation marks in the article? JN 466 11:26, 18 June 2009 (UTC)
Extended content
|
---|
If these two are to be the first of many specific examples, then, may I suggest that a subpage would be appropriate with a pointer from here? While it's probably a good idea to centralize conversation about the principles, this is but one point of this guideline, and lengthy tables and conversations about specific examples may overwhelm and distract from developing other points. As to the general, it seems revisiting the legal aspects of this might be useful. There are two factors to consider here. Close paraphrasing of free sources is a plagiarism concern. Whether or not it's allowed is up to consensus. Close paraphrasing of non-free sources is a copyright concern. Whether or not it's allowed is down to policy based on US law. Some of the examples you give above are "fragmented literal similarity", which is what it is called when literal duplication occurs, but copying is not comprehensive. Close paraphrasing may occur even in the absence of such fragments, if the structure of a source is copied but the language completely changed. Yes, you can violate US copyright law and be legally sanctioned without using a single word from your source if you rise to the level of "comprehensive non-literal similarity". The incorporation of literal similarity in such cases simply serves to strengthen the evidence against you, since it is pretty hard to defend against a charge of copying when evidence is clear that you have read the source and copied it. From a copyright standpoint, the dividing line between how much is too much (when we reach the point that a court says, "This is serious enough for us to care") is not firmly defined by legal code. We don't take chances...not only for our own use, but for that of our contributors. To refer back to WP:C, "If in doubt, write the content yourself, thereby creating a new copyrighted work which can be included in Wikipedia without trouble." (Close paraphrasing is derivative work, which is allowed only by the copyright holder.) There are a good many cases illustrating these copyright issues in action, but I'm going to quote a bit from Salinger v. Random House (we really need an article on that), since it seems particularly relevant to some of your points above. In that case, the court characterized the problem succinctly, noting that facts are not copyrighted but that "'vividness of description' is precisely an attribute of the author's expression that he is entitled to protect....The copier is not at liberty to avoid 'pedestrian' reportage by appropriating his subject's literary devices." The court also noted that "Though a cliche or an 'ordinary' word-combination by itself will frequently fail to demonstrate even the minimum level of creativity necessary for copyright protection..., such protection is available for the 'association, presentation, and combination of the ideas and thought which go to make up the [author's] literary composition.'...as we have more recently stated, 'What is protected is the manner of expression, the author's analysis or interpretation of events, the way he structures his material and marshals facts, his choice of words and the emphasis he gives to particular developments.'...The 'ordinary' phrase may enjoy no protection as such, but its use in a sequence of expressive words does not cause the entire passage to lose protection. And though the "ordinary" phrase may be quoted without fear of infringement, a copier may not quote or paraphrase the sequence of creative expression that includes such a phrase." (citations omitted; http://www.law.cornell.edu/copyright/cases/811_F2d_90.htm) Again, determining when such has risen to a legally actionable level is very complex. Courts consider many factors in determining if the fragmented similarity meets "fair use." Wikipedia has deliberately chosen to follow a more strict standard than fair use in order to make our content as reusable as possible. Hence, this guideline is not going to make close paraphrasing of copyright protected materials suddenly problematic, because close paraphrasing of copyright protected materials is already problematic. If you wish to pursue refining the application of the concept of close paraphrasing to free materials, please be sure to separate that out from non-free materials. If you do wish to consider its application to copyright protected materials, please remember that close paraphrasing reflects far more than occasional duplication of language. It also refers to lifting the structure of the material and the perspective and emphasis to facts/details/events. -- Moonriddengirl (talk) 13:12, 18 June 2009 (UTC)
|
Extended content
|
---|
← WP:C says, "There are some circumstances under which copyrighted works may be legally utilized without permission; see Wikipedia:Non-free content for specific details on when and how to utilize such material." I've already quoted what NFC says. Multiple policies and guidelines note that Wikipedia has deliberately chosen a more narrow road than "fair use." Verbatim quotations are permitted by policy. If you're suggesting that proper citation of close paraphrase makes a "lesser" taking, then I believe you may be misunderstanding the courts' position, but I'd be interested in seeing support for that. The operative words in your quote are "too-close paraphrasing." Wikipedia:Close paraphrasing is incorporated by reference in the "see also" section at the bottom. It does address situations where little room for originality in language exists. -- Moonriddengirl (talk) 22:23, 18 June 2009 (UTC)
|
The lead should sum up what is said in the body of the page and be stand alone. As such I have a problem with this second paragraph:
Plagiarism is the incorporation of someone else's work without providing adequate credit. Even if you have cited a source, make sure that your wording does not duplicate that of the source unless you note duplication by quotation marks or some other acceptable method (such as block quotations).[1] This applies even if your source is not copyrighted.
I think it confuses plagiarism with copyright. It is not just the wording that is plagiarism, but also claiming that someone else's idea is one's own (a big sin in academia).
However I am not so hung up of the first sentence. Where I have a problem is the second and third sentence as they imply that copying 1911EB is unacceptable although many many articles include 1911EB text, and other copyright expired sources. If all that text was to be placed in quotations then it could not be edited to update the style and information content within the quotes. The whole point of including chunks from 1911EB and similar is to put in place a seedbed of information on a topic that through the usual Wikipedia process can gradually be altered into a completely new and useful work, with some parts trimmed out and others added and the sentences altered so that they read as a contemporary work. Given 20 years or so the text will probably look nothing like the initial 1911EB text, but if it is in quotes this can never happen. If of course the original author of a copyright expired text makes a statement that is a point of view about something there is no reason why that specific point of view should not be included as a quote or otherwise attributed to the source -- in the usual way that is done for such text under copyright -- but there is no reason why the text in general should not be incorporated into the Wikipedia article. If text is copied verbatim from a PD source I think it is a good idea to include one of the templates in Category:Attribution templates, but I do not think it is essential.
So I think this second paragraph needs to be broken into two and expanded, so that there is an explanation of how text that is in copyright should be presented within an article and another paragraph on what to do if the text is copyright expired. This would roughly speaking cover the differences mentioned in detail in the sections of the page. -- PBS ( talk) 13:59, 18 June 2009 (UTC)
Re Doncram (07:12, 19 June 2009): We have been importing free content for years; see Category:Attribution templates. EB1911 was just one largish project, there are others as well. So any preliminary discussion of this would have happened very early in the project's history. But it's possible that other people, like me, simply view this as a natural part of the free content philosophy.
I was hoping that my post here might spark some discussion. I want to ensure that our attribution system meets CZ's requirements, and gives credit to all authors of an article, whether they are WP editors or not. And I am very willing to discuss how to do that. But I do not want to change our current practice, which would allow us, for example, to take a CZ article on a topic we do not have and copy that entire article to WP without changing a single character, apart from adding attribution that our text came from CZ. — Carl ( CBM · talk) 13:14, 19 June 2009 (UTC)
Re Philip Baird Shearer (13:59, 18 June 2009): I agree that the second paragraph is worded in a way that contradicts our usual practices for incorporating free content, and should be changed. — Carl ( CBM · talk) 13:29, 19 June 2009 (UTC)
Since we are talking about the lede section, the text just needs to be a faithful summary of the content lower on the page. I split the troublesome sentence into two, and made it refer directly to the full discussion lower down. — Carl ( CBM · talk) 13:37, 19 June 2009 (UTC)
In my opinion, the proposed prohibition against direct copying is completely inappropriate. Plagiarism is use of material without citing the source, period. If you cite the source, then copying is not plagiarism.
In a academic setting, Things are very different. When a student submits a paper, There is a presumption that the wording itself, and not just facts and ideas, are the work of the student whose name appears on the paper as the author. In general, the entire object of the paper is to demonstrate to the teacher that the student can do original work: This is the presumptive meaning of the student's name on the paper.
The situation at Wikipedia bears no relationship whatsoever to the student paper. There is no author's name on the Wikipedia article, and the reader has no a priori expectation of authorship. Our standards are explicitly at variance with those of a student paper: we consider original research, synthesis, and even creativity to be detrimental.
A casual Wikipedia reader is looking for information and has no expectation of authorship at all. We serve this reader best by providing the best possible encyclopedic content we can. A more serious reader will look deeper, and will want to know more about the sources, but a serious reader will almost immediately become aware that Wikipedia is a collaborative work and that any wording may have come from any of a huge collection of editors or other sources. An even more serious reader will quickly learn how to find out exactly where any wording in the article came from. The goal of our plagiarism policy should be to ensure that someone who wants to know where the wording came from will be able to do so.
So much for our obligation to the reader. But what about our obligation to the original author? This comes in two parts: legal obligation, and moral obligation. Our legal obligations are embodied in copyright law and are beyond the scope of the plagiarism guideline. Our moral obligation to the author is to provide acknowledgment and recognition. But we have a moral obligation to the reader, to provide an easily readable article, and the original author has a moral obligation to us, as members of civilization. We must balance these obligations. For me, I believe that we have no more (or less) moral obligation to an author than we have to any Wikipedia editor, but Since that author is known, we can an should cite the original in the edit summary.
In many cases, (e.g., the DNB) useful encyclopedic articles in the public domain were written by authors as work for hire, and were at most minimally attributed in the originals. For such sources, in my opinion we have no moral obligation to the publisher whatsoever: the publisher is a soulless legal entity, not an individual. The individual author already conveyed the copyright to the publisher, and I feel no particular obligation to the author, either. In (most) other cases the author of a work which is no longer in copyright has no expectation of further ownership of the work in the moral sense. We, as that author's cultural heirs, have a right to use the work as we see fit. This was the norm for more than a thousand years, and only changed when the printing press suddenly added monetary value to the ability to control the copying of a work. That is, it's all about money, not "moral rights" at all.
Please, do not artificially restrict our ability to use our cultural heritage based on some mistaken analogy with grading student papers. - Arch dude ( talk) 19:53, 19 June 2009 (UTC)
<-- As can be seen in the history, I rewrote the introduction, with a help from Moonriddengirl and I hope it reflects what the majority of editors consider acceptable. However as with all edit I expect it will be edited unmercifully. From what you wrote above Arch dude I think you are basically happy with the alteration but you are not comfortable with the paragraph that starts "Some external works that are copyright expired..." because it does not say that an editor MUST include attribution. I think that the way to deal with this is rather than adding it to the lead, (as it does not have to be done for legal reasons) it should be described in a section called "copyright expired" that if text is copied from a copyright expired source and no attribution has been added, rather than delete the text editors are encouraged to add attribution. That brings me to the next point, I think that the subsections "Attributing text copied from other sources" should be rewritten to reflect the different types of sources that I have touched upon in the introduction. -- PBS ( talk) 10:34, 23 June 2009 (UTC)
Am I the only person that find that this section is ludicrous? Allow me to quote the lead (attribution at the project page history section):
Wikipedia draws clear distinctions between work submitted by Wikipedia editors as their own work (which can be "edited mercilessly"), work marked as a quotation (which must be properly credited and left essentially untouched), work described as a paraphrase of another source (which can be edited as long as the original sense is not lost), and direct copying of large blocks of free content written by other people (which should also be credited). In quotations, editorial notes and minor changes are sometimes useful, but must be clearly marked as such. See WP:MOSQUOTE for details.
The section then goes on to to discuss, erm, none of the above cases. What follows instead is a discussion of the copyright status of various sources, and not a very good one at that.
Wikipedia policy requires the citation of sources. It also allows the restricted use of direct quotations under the Non-free content criteria, one of which is that the source is correctly attributed (another is that the source has been previously published outside of Wikipedia). Now this use of properly attributed textual quotations is not controversial: allow me to quote article 10 of the Berne Convention:
- It shall be permissible to make quotations from a work which has already been lawfully made available to the public, provided that their making is compatible with fair practice, and their extent does not exceed that justified by the purpose, including quotations from newspaper articles and periodicals in the form of press summaries.
- It shall be a matter for legislation in the countries of the Union, and for special agreements existing or to be concluded between them, to permit the utilization, to the extent justified by the purpose, of literary or artistic works by way of illustration in publications, broadcasts or sound or visual recordings for teaching, provided such utilization is compatible with fair practice.
- Where use is made of works in accordance with the preceding paragraphs of this Article, mention shall be made of the source, and of the name of the author if it appears thereon.
U.S. fair use law is very similar with regards to text, although many of the key cases date from before the accession of the U.S.A. to the Berne Convention.
I am well known for saying that this whole exercise is a waste of everybody's time: it is more so if you don't look at the copyright restrictions (and hence the existing WP policies) which also cover this area. Physchim62 (talk) 13:32, 22 June 2009 (UTC)
I suggest that this section is restructured so we have sections on the different types of text that can be copied into an article. It is based on the structure in the new introduction and obviously there is a lot of text in the current "Attributing text copied from other sources" which should also be included, but it seperates out the different way that text is used
suggested structure:
- Attributing text copied from other sources
A mention here that if an editor wishes to incorporate text from another source and is not sure which category the text fall into then they should ask on the talk page of the Wikipedia article, or ask at Wikipedia:Reliable sources/Noticeboard before copying any text into a Wikipedia article. As it is better to sort this out before a mess is created rather than afterwards.
- Sources under copyright
- Sources under copyleft
- Public domain sources
This is more complicated than copyright expired because I think there must be a clear statement of how the text happens to be in the public domain, and if there is a doubt then any text's status then it should be treated as under copyright.
- Copyright expired sources
- Compliance with the content policies
This might be better incorporated into the fist section after the when in doubt paragraph.
-- PBS ( talk) 11:02, 23 June 2009 (UTC)
In this edit, the statement that "When copying material within Wikipedia, from one article to another, attribution is also required" has been marked with the {{ dubious}} template, asking for discussion. Here is the discussion:
Material contributed to en:wiki is done under the GFDL and CC-BY-SA licenses. Both unambiguously require that attribution to the original author(s) is mandatory. Is the dubious part about the case where someone copies only a quotation or external free text here and there? In that case, attribution is still required, but to the original author(s), not the wiki editor who copied it in. In any other case, GFDL requires a History page to indicate the Authors of the new Document. If the contributing editor is copying from another en:wiki page, they are not the Author of that text, rather they are incorporating the work of previous Authors, whose contribution must be acknowledged. Franamax ( talk) 01:17, 26 June 2009 (UTC)
(unindent)I don't think there is any real controversy, it has always been common practice to assume that not preserving the history in one way or another does violate the GFDL. It does get fuzzy because wikipedia never followed the GFDL to the letter, choosing to ignore some sections. But it never ignored the attribution requirements. Note that all work here was able to be dual licensed, even old work, because Wikipedia is such a large user of GFDL that they convinced FSF to designate the CC-SA license as the new version of the GFDL, in a tricky move that was somewhat controversial. Gigs ( talk) 01:45, 30 June 2009 (UTC)
I clarified the sentence in question. Regardless of any arguments about what the licenses actually say, there is a long-established practice here that we do not require in-article attribution when we copy text from one Wikipedia article to another, just an edit summary (see WP:MERGE). The wisdom of that policy can be debated, but it is not an issue that can be resolved on this page. I'd suggest the village pump. — Carl ( CBM · talk) 03:00, 30 June 2009 (UTC)
- Importing text:
- If you want to import text that you have found elsewhere or that you have co-authored with others, you can only do so if it is available under terms that are compatible with the CC-BY-SA license. You do not need to ensure or guarantee that the imported text is available under the GNU Free Documentation License. Furthermore, please note that you cannot import information which is available only under the GFDL. In other words, you may only import text that is (a) single-licensed under terms compatible with the CC-BY-SA license or (b) dual-licensed with the GFDL and another license with terms compatible with the CC-BY-SA license
- If you import text under a compatible license which requires attribution, you must, in a reasonable fashion, credit the author(s). Where such credit is commonly given through page histories (such as Wikimedia-internal copying), it is sufficient to give attribution in the edit summary, which is recorded in the page history, when importing the text. Regardless of the license, the text you import may be rejected if the required attribution is deemed too intrusive.
-- PBS ( talk) 08:50, 30 June 2009 (UTC)
Granted, today is a hot day here, but even on a cooler day I think I would struggle to get all the way to the end of this lead. It is too dense, presents too much detail too soon. JN 466 16:36, 30 June 2009 (UTC)
This is not clear - when editor copies & pastes large chunks of text but attributes them (with inline ref or such) are we dealing with plagiarism or copyvio (or both)? Which policy (policies) should be cited in a warning? -- Piotr Konieczny aka Prokonsul Piotrus| talk 06:44, 14 July 2009 (UTC)