This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
This article is Windows-centric. I'll go so far as to say it's Windows-myopic. It defies the imagination why time and again people who have no clue whatsoever come in to write authoritative articles. And of course get everything horribly wrong. There should be expert review before anything of a technical nature is published here. You're misleading a lot of people and that is simply unforgivable.
And no, C does not stand for common. Seriously. Or maybe it stands for other four letter words not mentioned here? Seriously: who cares? Why don't you concentrate on getting the technical aspects of this article right instead? And leave your urban legends to another place, another time? And of course if you don't have the expert knowledge to correct this article then you shouldn't be here in the first place.
The Weird World of Wikipedia. Disgusting. —Preceding unsigned comment added by 81.50.44.156 ( talk) 19:10, 20 March 2009 (UTC)
john 07:21, 26 Sep 2004 (UTC)
The article should be descriptive of usage, not prescriptive. The acronym CSV has obviously been interpreted as "Character Separated Values" in many circumstances, certainly in my experience, despite usually meaning "Comma Separated Values". Since there is no standard technical definition, I don't see how there can be an authority beyond evidence of that usage. I'm not sure what counts as a good source, but hopefully we can find something better than the following tidbits and then fix the ugly misinformation: http://acronyms.thefreedictionary.com/Character+Separated+Values http://www.acronymfinder.com/Character-Separated-Values-(data-format)-(CSV).html 129.67.45.74 ( talk) 16:40, 8 June 2015 (UTC)
I think that the general guidelines of a CSV should be explained in the Formal Specifications section rather than in the Example section (see note 1 below), stating clearly that those are not the standard, but perhaps the most widely used ones. I propose the Creativyst guidelines to be used (already linked from mentioned section). It would also be good to note the differences between the last and the RFC 4180. Juan Loman. 00:44, 13 December 2005 (PST)
Note 1: The guidelines in the "Example" section are good for an example, but should not be the only ones in the entire document.
Does anyone else think that the format shown in the example CSV data is a poor poor choice to show other people, with spaces between the comma and the text qualifier?
1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996, Jeep, Grand Cherokee, "air, moon roof, loaded MUST SELL!", 4799.00
Umm... The text says the example illustrates "a space before and after delimeter commans may not be trimmed". But I don't see how it illustrates that... RobertII ( talk) 21:30, 30 December 2009 (UTC)
There was a huge unwieldy section which basically was a list of everyone (me included) promoting their own little CSV tool. I've changed this into a slightly less unwieldy table. Richard W.M. Jones 15:59, 21 October 2005 (UTC)
I would suggest condensing the application support, programming tools and utilities sections into a single paragraph that explains how CSV an extremely widely supported and implemented file format. You can then point to another article for that section and title it something like "Comma-separated values (implementations)". That way, the information can stay in the encyclopedia, the list can grow and it stays out of the main article where most people probably don't want it anyway. I think this article could use a lot of cleanup. -- MattWright ( talk) 23:47, 28 January 2006 (UTC)
Is there any maximum limit on the number of records that a CSV file can have? Does the max limit for Excel which is around 65536 apply for the CSV file also? Pls post a reply. Thanks
Filename extension |
.csv |
---|
Maybe add a infobox?
"CSV file format is a delimited data format that has fields separated by the comma character and records separated by newlines." Maybe it would be more clear to say a CSV file is a "flat file" that contains tabular data? Probably flat file isn't recognized outside the relational database world, but everyone is familiar with the idea of tabular data. ( Maybe some text along the lines of "... like an Excel table." I'm not saying the article shouldn't explain how CSVs are structured, but the introduction shouldn't be intimidating, even to someone from outside the IT world. ForrestCroce 01:45, 20 December 2006 (UTC)
This article used to be pretty simple and clear but it's not so simple or clear anymore.
First, someone pandering Delimeter-separated_values has taken an interest in this page. I've never heard of delimiter separated values and searching the web for it comes up with basically nothing. I think someone is trying to organize concepts at the expense of history. We know "comma separated values" is a misnomer but the fact is that's what people have been calling this format forever. You can't invent things on Wikipedia.
Second, the examples with bullets of notes is odd. It's in the Specification section and starts "The basic rules are". That whole sequence of examples with bullets of notes should be in the Example section. The Specification section should only cite specifications. I think the old simple example and "The basic rules are" sequence should be merged. Specifically the old example should be a quick and simple example at the top of the examples section. The bullets from the old example should be replaced with "The basic rules are" sequence.
-- Miallen 02:39, 17 May 2007 (UTC)
There are several problems here:
Therefore, merging CSV and Delimiter separated values (or whatever you want to call it) sounds like a bad idea because the potential for confusion is already high, given that these articles talk about concepts with either poorly-chosen (but well-established) names, or no well-established name at all.
This article still could use some cleanup, no doubt about that, but that does not warrant a merge of loosely-related articles, especially since the potential for confusion is high. dr.ef.tymac 14:51, 7 July 2007 (UTC)
I concur with dr.ef.tymac's remarks. -- Crath 21:05, 7 July 2007 (UTC)
Negative MFNickster. Please do not redirect to "Delimiter separated values". CSV is the term people recognise. Let's not get carried away with sematic details pls. -- Miallen 21:23, 26 September 2007 (UTC)
So user -- Miallen 02:39, 17 May 2007 (UTC)-- wrote; "You can't invent things on Wikipedia." Ah, contraire. It's done all the time. Even well-cited passages (with still functioning links!) get gaffled with alarming frequency. Which is why I now no longer contribute, except occasionally on the "Talk Tab" for cathartic purposes :) <sigh> I feel better now. — Preceding unsigned comment added by 159.121.119.134 ( talk)
Our article says: "Leading and trailing spaces or tabs, adjacent to commas, are trimmed" opposed to RFC 4180 stating: "Spaces are considered part of a field and should not be ignored". This leads imho to the following question: What is the purpose of this article? To define 'CSV according to WP', 'CSV according to the creativyst article', 'CSV according to RFC' or -what I favor- information on all (notable) styles? Tierlieb 10:53, 12 June 2007 (UTC)
I agree, the WP article should document the various (notable) CSV styles in use. May I suggest the addition of a section the article that documetns the effect various major applications have had upon CSV; e.g., Excel's dominance as a spreadsheet of has caused many to understand the CSV format only as Excel understands it. -- Crath 21:11, 7 July 2007 (UTC)
Note that RFCs are only informational and must be evaluated for relevance. Unfortunately, in the case of RFC 4180 it sounds like it not terribly relevant. The "specification" for CSV is defined by Microsoft Excel's CSV import / export code. I suspect that might be an unpopular idea to some but AFAIC any CSV emitted by an application MUST be completely compatible with Excel because of that application's long continuing history of support for CSV. If someone wants to do an RFC that's fine but I think it should go as far as to actually state that it is simply formalizing observed behavior of the Microsoft Excel spreadsheet application. -- Miallen 21:43, 26 September 2007 (UTC)
US bias: I know for Germany that Excel as standard uses ; not ,. Maybe this is true for more local editions since lot of countries use , as a deciaml separator, IIRC there is also a international agreement on that. The RFC is not much relevant for CSV, CSV exists much longer. This notable separator issue was removed here http://en.wikipedia.org/?title=Comma-separated_values&diff=54321682&oldid=54291230 UnLoCode ( talk) 14:31, 2 April 2008 (UTC)
With all these discussions about allowing use of fair-use images or not, how to avoid costly lawsuits... WHY does anyone put a copyrighted image in the article (referring to screenshot of import window of MS Access) if there is so much free software around? -- Ben T/ C 08:49, 27 September 2007 (UTC)
IIRC OpenOffice Base 2.3 does _not_ support csv import. Not tested 2.4, but tired of that testing with every edition. UnLoCode ( talk) 14:34, 2 April 2008 (UTC)
I have noticed that, in Windows at least, if you have the comma set as your decimal separator (which many countries use) then Excel will export a "comma delimited" CSV file will semi-colons instead of commas. This is something to watch out for if you are importing or exporting data from your own applications with the intention that people will be able to load it into Excel. I ran into this problem with a colleague in Europe. Does anybody know if this is common in other applications or if it's just an Excel thing? Also are there any common alternatives other than comma and semi-colon? Wjousts ( talk) 19:36, 3 April 2008 (UTC)
I ran into a similar issue with a European colleague and was told by him that CSV files always (in his experience, of course) used a semicolon (ie not because of Windows nor Excel) —Preceding
unsigned comment added by
91.85.197.128 (
talk)
11:13, 21 July 2008 (UTC)
Yes this is because of the language settings and stupid windows which is not compatible with it self (tools). So excel for example in Finland uses the country settings ',' (comma) and in Denmark it uses ';'. So that breaks the compatibility. That's untolerable but that's excel ;D 192.100.124.218 ( talk) 11:44, 15 September 2008 (UTC)
Does anyone besides me think that the edit applied to this entry today, where "values" was replaced with "volume", is inappropriate? I've been working in and around computers for 25 years and I have never before heard CSV referred to as Comma Separated Volume. Rather than simply revert the edit, I thought I'd see what others think of the edit. Christopher Rath ( talk) 19:34, 17 April 2008 (UTC)
Article mentions:
"Each record is one line terminated by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks can be embedded."
RFC4180:
"Each record is located on a separate line, delimited by a line break (CRLF)." but it then says
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
which (correctly?) implies any field that contains a CR, LF or CRLF should be all enclosed in a double quoted field.
Interestingly my version of Excel will accept either CR or CRLF to delimit a record but CRLF when it's embedded in a field enclosed in double quotes, results in a line break together with the "square" character representing a character that cannot be displayed, in this case the CR. ie only a single LF works correctly if part of a field.
Article then links to [ [6]], which adds in a lone CR to the mix.
Similar problems in the article referring to:
"leading and trailing spaces or tabs, adjacent to commas, are trimmed. This practice is contentious and in fact is specifically prohibited by RFC 4180, which states, "Spaces are considered part of a field and should not be ignored.""
At severe danger of confusing spaces with whitespace here - eg does one need to enclosure a field in double quotes, if it has leading or trailing tabs?
Any one want to tackle these points? - I don't! —Preceding unsigned comment added by 124.191.116.29 ( talk) 00:26, 13 June 2009 (UTC)
It has been my experience that some versions of some Microsoft products, as well as third-party products, will fail on a CSV file delimited by LF characters. They only work if the file is delimited by the CRLF sequence. -- Jym ( talk) 21:05, 23 March 2010 (UTC)
The secction on Pilcrow support in applications should be moved to CSV application support. —Preceding unsigned comment added by Paddy3118 ( talk • contribs) 05:29, 31 July 2009 (UTC)
Sorry, but after reading the introduction, no "normal" person will know any more than before reading it. What's all this confusion about? A "table of lists form"? What does that even mean? And "where each associated item (member) in a group is in association with others also separated by the commas of its set." - wow, I've never read anything more confusing!
What about "A comma-seperated values (CSV) file is used for the digital storage of tabular data, where each table row is stored as one line in a text file, with the individual columns seperated by commas."??? Or is that not "programmer-like" enough? —Preceding unsigned comment added by Intrr ( talk • contribs) 15:42, 23 May 2010 (UTC)
It would be better if lines didn't have the commas all aligned. This would accentuate the fact that it is the commas and not some horizontal index in the line, that is the field delimiter. -- Paddy ( talk) 06:18, 31 January 2011 (UTC)
I recently discovered (and read about on the web) that .csv inheretly supports ANSI and not unicode. should that be put in? http://support.microsoft.com/kb/172727 69.136.72.16 ( talk) 02:58, 7 February 2011 (UTC)
While i've heard of (and used) text data files divided by tabs i've never seen them reffered to as CSV files. Does anyone have any source for this terminology. 130.88.108.187 ( talk) 13:20, 10 September 2013 (UTC)
Agreed. This article - about Comma-separated values - talks about other formats that are completely uninteresting to me. The whole section about how some people see CSV as being anything other than comma-separated values, and the seemingly meticulous avoidance of mentioning commas as being the separator, make this article awkward to read, and leave me puzzled. It reads as if the article used to be about other formats, but was trimmed down to be about C(omma)SV, but the smell of earlier text remains.
Just create a separate page describing various delimiter-separated values file formats, and include all of the confusion and (non-)controversy about the delimiter on that page. It makes sense for this page to mention: a) other formats are similar but use other delimiters, and b) some CSV applications use colons as separators when commas can cause confusion such as for some date formats. But that should be about all.
And the suggestion that CSV is anything but comma-separated values is ludicrous. I've located resources written as far back as 2004 (eg https://repositories.tdl.org/ttu-ir/bitstream/handle/2346/17115/31295019801124.pdf) that use the full term "Comma Separated Values". Anything else is a clumsy attempt at a backronym. Jlaidman ( talk) 01:40, 22 October 2015 (UTC)
As is described in the article there is currently no real CSV standard. However, there are various moves afoot to change this, not least the W3C CSV on the web working group. Also today The National Archives has released a CSV Schema language and CSV Validator, more info at http://blog.nationalarchives.gov.uk/blog/csv-validator-new-digital-preservation-tool/ - by me hence why I'm only adding on the talk page, not into the article so others can decide on the significance so far as the article is concerned. David Underdown ( talk) 11:25, 15 July 2014 (UTC)
Why does the CSV article have a huge disclaimer at the top? The article seems pretty good to me. You are causing people to wonder if the article is accurate. I think it is (and I didn't have anything to do with it). Dtaviation ( talk) 14:47, 16 May 2015 (UTC)Dave
All reputable sources that I can find say the common file extension for a comma-separated values format is .csv
, so the infobox should reflect this generalisation. The .txt
extension is widely viewed as a
text file extension, which could contain comma-separated values, but it is not a common file extension for a comma-separated values file format. +
m
t
22:18, 16 March 2016 (UTC)
There are two commonly used text file formats: Delimited text files (.txt), in which the TAB character (ASCII character code 009) typically separates each field of text. Comma separated values text files (.csv), in which the comma character (,) typically separates each field of text.
.txt
, but that fails to note that there is a structure to CSVs. The infobox is not a place to discuss boundary cases, the text is the place to explain that the standard is not always followed. The quoted example of unicode in Excel doesn't really address the CSV standard, it is suggesting a workaround for a regional deficiency in the program. It is also important to distinguish between input and output formats. Good coding practice is to accept as many input variants as possible but to output only according to standard.
Martin of Sheffield (
talk)
13:08, 17 March 2016 (UTC)MIME media type name: text MIME subtype name: csv
File extension(s): CSV
The article currently says
"According to RFC 4180, spaces outside quotes in a field are not allowed".
I think that is wrong. The spec does not say that. Instead, the spec explicitly allows them with
record = field *(COMMA field) field = (escaped / non-escaped) non-escaped = *TEXTDATA TEXTDATA = %x20-21 / %x23-2B / %x2D-7E. In there %x20 is the space character
In there, %x20 is the space character. So the space character is allowed in fields that aren't quoted.
If anything, we could interpret the current wording as "if the field is quoted, then there must not be spaces outside the quotes", but currently it reads much more like "if the field contains a space, it must be quoted". Even if the former was the intent, the wording should be clarified, and it should be explicitly stated that quotes are not required for fields that include spaces. Then we can also remove the current "however, the RFC also says ..." wording, which indicates that there's a contradiction when there isn't one.
Further references: My comment on stackoverflow, My comment on superuser Issue for Haskell's CSV library
Nh2-wiki ( talk) 21:55, 1 June 2016 (UTC)
"foo", "bar"
non-escaped
field. *TEXTDATA
does permit leading spaces (and makes them significant as field data), but DQUOTE
is not an element of TEXTDATA
, so a non-escaped
field may not have a DQUOTE
anywhere within. That means the premise of spaces outside the quotation marks cannot be met.escaped
fields. Those fields begin and end with DQUOTE, so there are no spaces outside the quotatin marks.TEXTDATA
is
TEXTDATA = %x20-21 / %x23-2B / %x2D-7E
46°20'48"N
. Some implementations treat DQUOTE as a normal character if the field did not start with a DQUOTE; those implementations allow quoted fields (e.g., "Acme Products, Inc.") and GIS coordinates.A heads up to those watching this article that may not also be tracking the linked CSV application support page. Some uninterested parties are proposing that the CSV application support page be deleted. To participate in that discussion, please see Wikipedia:Articles for deletion/CSV application support Christopher Rath ( talk) 12:58, 30 December 2018 (UTC)
@ Crath: – mine was a revert of an IP user, you should follow WP:BRD not simply re-revert. The opening of that paragraph now reads "RFC 4180 proposes a specification for the CSV format, and this is the definition commonly used. As a result, in practice ..."; "as a result" of what? It implies that because the RFC proposes a definition in practice it is ignored, something I'm sure we both agree is wrong. I'm recasting the whole opening of the paragraph to make the ambiguities clearer. Regards, Martin of Sheffield ( talk) 15:13, 20 January 2019 (UTC)
'@ Martin of Sheffield:, my apologies for not reading the change log closely enough. Your rewording looks very good. Thanks! Christopher Rath ( talk) 02:46, 22 January 2019 (UTC)
The section "Application support" says, "Many utility programs on Unix-style systems can operate on CSV files", then lists as examples cut, paste, join, sort, uniq, emacs, awk. With the exception of emacs, I think that is incorrect. Those programs can of course split a string with a comma separator, but I believe they cannot natively handle commas within quotation marks, which for me is what qualifies it as a CSV parser. At the least it would need to be amended to "... can operate on some CSV files", but I would rather remove it altogether. Adpete ( talk) 22:59, 3 March 2019 (UTC)
There's this odd sentence in the section General functionality:
Similarly, CSV cannot naturally represent hierarchical or object-oriented databases or other data.
I think the highlighted words should be deleted, or is there some special meaning there? BroVic ( talk) 08:22, 8 March 2019 (UTC)
This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
This article is Windows-centric. I'll go so far as to say it's Windows-myopic. It defies the imagination why time and again people who have no clue whatsoever come in to write authoritative articles. And of course get everything horribly wrong. There should be expert review before anything of a technical nature is published here. You're misleading a lot of people and that is simply unforgivable.
And no, C does not stand for common. Seriously. Or maybe it stands for other four letter words not mentioned here? Seriously: who cares? Why don't you concentrate on getting the technical aspects of this article right instead? And leave your urban legends to another place, another time? And of course if you don't have the expert knowledge to correct this article then you shouldn't be here in the first place.
The Weird World of Wikipedia. Disgusting. —Preceding unsigned comment added by 81.50.44.156 ( talk) 19:10, 20 March 2009 (UTC)
john 07:21, 26 Sep 2004 (UTC)
The article should be descriptive of usage, not prescriptive. The acronym CSV has obviously been interpreted as "Character Separated Values" in many circumstances, certainly in my experience, despite usually meaning "Comma Separated Values". Since there is no standard technical definition, I don't see how there can be an authority beyond evidence of that usage. I'm not sure what counts as a good source, but hopefully we can find something better than the following tidbits and then fix the ugly misinformation: http://acronyms.thefreedictionary.com/Character+Separated+Values http://www.acronymfinder.com/Character-Separated-Values-(data-format)-(CSV).html 129.67.45.74 ( talk) 16:40, 8 June 2015 (UTC)
I think that the general guidelines of a CSV should be explained in the Formal Specifications section rather than in the Example section (see note 1 below), stating clearly that those are not the standard, but perhaps the most widely used ones. I propose the Creativyst guidelines to be used (already linked from mentioned section). It would also be good to note the differences between the last and the RFC 4180. Juan Loman. 00:44, 13 December 2005 (PST)
Note 1: The guidelines in the "Example" section are good for an example, but should not be the only ones in the entire document.
Does anyone else think that the format shown in the example CSV data is a poor poor choice to show other people, with spaces between the comma and the text qualifier?
1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996, Jeep, Grand Cherokee, "air, moon roof, loaded MUST SELL!", 4799.00
Umm... The text says the example illustrates "a space before and after delimeter commans may not be trimmed". But I don't see how it illustrates that... RobertII ( talk) 21:30, 30 December 2009 (UTC)
There was a huge unwieldy section which basically was a list of everyone (me included) promoting their own little CSV tool. I've changed this into a slightly less unwieldy table. Richard W.M. Jones 15:59, 21 October 2005 (UTC)
I would suggest condensing the application support, programming tools and utilities sections into a single paragraph that explains how CSV an extremely widely supported and implemented file format. You can then point to another article for that section and title it something like "Comma-separated values (implementations)". That way, the information can stay in the encyclopedia, the list can grow and it stays out of the main article where most people probably don't want it anyway. I think this article could use a lot of cleanup. -- MattWright ( talk) 23:47, 28 January 2006 (UTC)
Is there any maximum limit on the number of records that a CSV file can have? Does the max limit for Excel which is around 65536 apply for the CSV file also? Pls post a reply. Thanks
Filename extension |
.csv |
---|
Maybe add a infobox?
"CSV file format is a delimited data format that has fields separated by the comma character and records separated by newlines." Maybe it would be more clear to say a CSV file is a "flat file" that contains tabular data? Probably flat file isn't recognized outside the relational database world, but everyone is familiar with the idea of tabular data. ( Maybe some text along the lines of "... like an Excel table." I'm not saying the article shouldn't explain how CSVs are structured, but the introduction shouldn't be intimidating, even to someone from outside the IT world. ForrestCroce 01:45, 20 December 2006 (UTC)
This article used to be pretty simple and clear but it's not so simple or clear anymore.
First, someone pandering Delimeter-separated_values has taken an interest in this page. I've never heard of delimiter separated values and searching the web for it comes up with basically nothing. I think someone is trying to organize concepts at the expense of history. We know "comma separated values" is a misnomer but the fact is that's what people have been calling this format forever. You can't invent things on Wikipedia.
Second, the examples with bullets of notes is odd. It's in the Specification section and starts "The basic rules are". That whole sequence of examples with bullets of notes should be in the Example section. The Specification section should only cite specifications. I think the old simple example and "The basic rules are" sequence should be merged. Specifically the old example should be a quick and simple example at the top of the examples section. The bullets from the old example should be replaced with "The basic rules are" sequence.
-- Miallen 02:39, 17 May 2007 (UTC)
There are several problems here:
Therefore, merging CSV and Delimiter separated values (or whatever you want to call it) sounds like a bad idea because the potential for confusion is already high, given that these articles talk about concepts with either poorly-chosen (but well-established) names, or no well-established name at all.
This article still could use some cleanup, no doubt about that, but that does not warrant a merge of loosely-related articles, especially since the potential for confusion is high. dr.ef.tymac 14:51, 7 July 2007 (UTC)
I concur with dr.ef.tymac's remarks. -- Crath 21:05, 7 July 2007 (UTC)
Negative MFNickster. Please do not redirect to "Delimiter separated values". CSV is the term people recognise. Let's not get carried away with sematic details pls. -- Miallen 21:23, 26 September 2007 (UTC)
So user -- Miallen 02:39, 17 May 2007 (UTC)-- wrote; "You can't invent things on Wikipedia." Ah, contraire. It's done all the time. Even well-cited passages (with still functioning links!) get gaffled with alarming frequency. Which is why I now no longer contribute, except occasionally on the "Talk Tab" for cathartic purposes :) <sigh> I feel better now. — Preceding unsigned comment added by 159.121.119.134 ( talk)
Our article says: "Leading and trailing spaces or tabs, adjacent to commas, are trimmed" opposed to RFC 4180 stating: "Spaces are considered part of a field and should not be ignored". This leads imho to the following question: What is the purpose of this article? To define 'CSV according to WP', 'CSV according to the creativyst article', 'CSV according to RFC' or -what I favor- information on all (notable) styles? Tierlieb 10:53, 12 June 2007 (UTC)
I agree, the WP article should document the various (notable) CSV styles in use. May I suggest the addition of a section the article that documetns the effect various major applications have had upon CSV; e.g., Excel's dominance as a spreadsheet of has caused many to understand the CSV format only as Excel understands it. -- Crath 21:11, 7 July 2007 (UTC)
Note that RFCs are only informational and must be evaluated for relevance. Unfortunately, in the case of RFC 4180 it sounds like it not terribly relevant. The "specification" for CSV is defined by Microsoft Excel's CSV import / export code. I suspect that might be an unpopular idea to some but AFAIC any CSV emitted by an application MUST be completely compatible with Excel because of that application's long continuing history of support for CSV. If someone wants to do an RFC that's fine but I think it should go as far as to actually state that it is simply formalizing observed behavior of the Microsoft Excel spreadsheet application. -- Miallen 21:43, 26 September 2007 (UTC)
US bias: I know for Germany that Excel as standard uses ; not ,. Maybe this is true for more local editions since lot of countries use , as a deciaml separator, IIRC there is also a international agreement on that. The RFC is not much relevant for CSV, CSV exists much longer. This notable separator issue was removed here http://en.wikipedia.org/?title=Comma-separated_values&diff=54321682&oldid=54291230 UnLoCode ( talk) 14:31, 2 April 2008 (UTC)
With all these discussions about allowing use of fair-use images or not, how to avoid costly lawsuits... WHY does anyone put a copyrighted image in the article (referring to screenshot of import window of MS Access) if there is so much free software around? -- Ben T/ C 08:49, 27 September 2007 (UTC)
IIRC OpenOffice Base 2.3 does _not_ support csv import. Not tested 2.4, but tired of that testing with every edition. UnLoCode ( talk) 14:34, 2 April 2008 (UTC)
I have noticed that, in Windows at least, if you have the comma set as your decimal separator (which many countries use) then Excel will export a "comma delimited" CSV file will semi-colons instead of commas. This is something to watch out for if you are importing or exporting data from your own applications with the intention that people will be able to load it into Excel. I ran into this problem with a colleague in Europe. Does anybody know if this is common in other applications or if it's just an Excel thing? Also are there any common alternatives other than comma and semi-colon? Wjousts ( talk) 19:36, 3 April 2008 (UTC)
I ran into a similar issue with a European colleague and was told by him that CSV files always (in his experience, of course) used a semicolon (ie not because of Windows nor Excel) —Preceding
unsigned comment added by
91.85.197.128 (
talk)
11:13, 21 July 2008 (UTC)
Yes this is because of the language settings and stupid windows which is not compatible with it self (tools). So excel for example in Finland uses the country settings ',' (comma) and in Denmark it uses ';'. So that breaks the compatibility. That's untolerable but that's excel ;D 192.100.124.218 ( talk) 11:44, 15 September 2008 (UTC)
Does anyone besides me think that the edit applied to this entry today, where "values" was replaced with "volume", is inappropriate? I've been working in and around computers for 25 years and I have never before heard CSV referred to as Comma Separated Volume. Rather than simply revert the edit, I thought I'd see what others think of the edit. Christopher Rath ( talk) 19:34, 17 April 2008 (UTC)
Article mentions:
"Each record is one line terminated by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks can be embedded."
RFC4180:
"Each record is located on a separate line, delimited by a line break (CRLF)." but it then says
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
which (correctly?) implies any field that contains a CR, LF or CRLF should be all enclosed in a double quoted field.
Interestingly my version of Excel will accept either CR or CRLF to delimit a record but CRLF when it's embedded in a field enclosed in double quotes, results in a line break together with the "square" character representing a character that cannot be displayed, in this case the CR. ie only a single LF works correctly if part of a field.
Article then links to [ [6]], which adds in a lone CR to the mix.
Similar problems in the article referring to:
"leading and trailing spaces or tabs, adjacent to commas, are trimmed. This practice is contentious and in fact is specifically prohibited by RFC 4180, which states, "Spaces are considered part of a field and should not be ignored.""
At severe danger of confusing spaces with whitespace here - eg does one need to enclosure a field in double quotes, if it has leading or trailing tabs?
Any one want to tackle these points? - I don't! —Preceding unsigned comment added by 124.191.116.29 ( talk) 00:26, 13 June 2009 (UTC)
It has been my experience that some versions of some Microsoft products, as well as third-party products, will fail on a CSV file delimited by LF characters. They only work if the file is delimited by the CRLF sequence. -- Jym ( talk) 21:05, 23 March 2010 (UTC)
The secction on Pilcrow support in applications should be moved to CSV application support. —Preceding unsigned comment added by Paddy3118 ( talk • contribs) 05:29, 31 July 2009 (UTC)
Sorry, but after reading the introduction, no "normal" person will know any more than before reading it. What's all this confusion about? A "table of lists form"? What does that even mean? And "where each associated item (member) in a group is in association with others also separated by the commas of its set." - wow, I've never read anything more confusing!
What about "A comma-seperated values (CSV) file is used for the digital storage of tabular data, where each table row is stored as one line in a text file, with the individual columns seperated by commas."??? Or is that not "programmer-like" enough? —Preceding unsigned comment added by Intrr ( talk • contribs) 15:42, 23 May 2010 (UTC)
It would be better if lines didn't have the commas all aligned. This would accentuate the fact that it is the commas and not some horizontal index in the line, that is the field delimiter. -- Paddy ( talk) 06:18, 31 January 2011 (UTC)
I recently discovered (and read about on the web) that .csv inheretly supports ANSI and not unicode. should that be put in? http://support.microsoft.com/kb/172727 69.136.72.16 ( talk) 02:58, 7 February 2011 (UTC)
While i've heard of (and used) text data files divided by tabs i've never seen them reffered to as CSV files. Does anyone have any source for this terminology. 130.88.108.187 ( talk) 13:20, 10 September 2013 (UTC)
Agreed. This article - about Comma-separated values - talks about other formats that are completely uninteresting to me. The whole section about how some people see CSV as being anything other than comma-separated values, and the seemingly meticulous avoidance of mentioning commas as being the separator, make this article awkward to read, and leave me puzzled. It reads as if the article used to be about other formats, but was trimmed down to be about C(omma)SV, but the smell of earlier text remains.
Just create a separate page describing various delimiter-separated values file formats, and include all of the confusion and (non-)controversy about the delimiter on that page. It makes sense for this page to mention: a) other formats are similar but use other delimiters, and b) some CSV applications use colons as separators when commas can cause confusion such as for some date formats. But that should be about all.
And the suggestion that CSV is anything but comma-separated values is ludicrous. I've located resources written as far back as 2004 (eg https://repositories.tdl.org/ttu-ir/bitstream/handle/2346/17115/31295019801124.pdf) that use the full term "Comma Separated Values". Anything else is a clumsy attempt at a backronym. Jlaidman ( talk) 01:40, 22 October 2015 (UTC)
As is described in the article there is currently no real CSV standard. However, there are various moves afoot to change this, not least the W3C CSV on the web working group. Also today The National Archives has released a CSV Schema language and CSV Validator, more info at http://blog.nationalarchives.gov.uk/blog/csv-validator-new-digital-preservation-tool/ - by me hence why I'm only adding on the talk page, not into the article so others can decide on the significance so far as the article is concerned. David Underdown ( talk) 11:25, 15 July 2014 (UTC)
Why does the CSV article have a huge disclaimer at the top? The article seems pretty good to me. You are causing people to wonder if the article is accurate. I think it is (and I didn't have anything to do with it). Dtaviation ( talk) 14:47, 16 May 2015 (UTC)Dave
All reputable sources that I can find say the common file extension for a comma-separated values format is .csv
, so the infobox should reflect this generalisation. The .txt
extension is widely viewed as a
text file extension, which could contain comma-separated values, but it is not a common file extension for a comma-separated values file format. +
m
t
22:18, 16 March 2016 (UTC)
There are two commonly used text file formats: Delimited text files (.txt), in which the TAB character (ASCII character code 009) typically separates each field of text. Comma separated values text files (.csv), in which the comma character (,) typically separates each field of text.
.txt
, but that fails to note that there is a structure to CSVs. The infobox is not a place to discuss boundary cases, the text is the place to explain that the standard is not always followed. The quoted example of unicode in Excel doesn't really address the CSV standard, it is suggesting a workaround for a regional deficiency in the program. It is also important to distinguish between input and output formats. Good coding practice is to accept as many input variants as possible but to output only according to standard.
Martin of Sheffield (
talk)
13:08, 17 March 2016 (UTC)MIME media type name: text MIME subtype name: csv
File extension(s): CSV
The article currently says
"According to RFC 4180, spaces outside quotes in a field are not allowed".
I think that is wrong. The spec does not say that. Instead, the spec explicitly allows them with
record = field *(COMMA field) field = (escaped / non-escaped) non-escaped = *TEXTDATA TEXTDATA = %x20-21 / %x23-2B / %x2D-7E. In there %x20 is the space character
In there, %x20 is the space character. So the space character is allowed in fields that aren't quoted.
If anything, we could interpret the current wording as "if the field is quoted, then there must not be spaces outside the quotes", but currently it reads much more like "if the field contains a space, it must be quoted". Even if the former was the intent, the wording should be clarified, and it should be explicitly stated that quotes are not required for fields that include spaces. Then we can also remove the current "however, the RFC also says ..." wording, which indicates that there's a contradiction when there isn't one.
Further references: My comment on stackoverflow, My comment on superuser Issue for Haskell's CSV library
Nh2-wiki ( talk) 21:55, 1 June 2016 (UTC)
"foo", "bar"
non-escaped
field. *TEXTDATA
does permit leading spaces (and makes them significant as field data), but DQUOTE
is not an element of TEXTDATA
, so a non-escaped
field may not have a DQUOTE
anywhere within. That means the premise of spaces outside the quotation marks cannot be met.escaped
fields. Those fields begin and end with DQUOTE, so there are no spaces outside the quotatin marks.TEXTDATA
is
TEXTDATA = %x20-21 / %x23-2B / %x2D-7E
46°20'48"N
. Some implementations treat DQUOTE as a normal character if the field did not start with a DQUOTE; those implementations allow quoted fields (e.g., "Acme Products, Inc.") and GIS coordinates.A heads up to those watching this article that may not also be tracking the linked CSV application support page. Some uninterested parties are proposing that the CSV application support page be deleted. To participate in that discussion, please see Wikipedia:Articles for deletion/CSV application support Christopher Rath ( talk) 12:58, 30 December 2018 (UTC)
@ Crath: – mine was a revert of an IP user, you should follow WP:BRD not simply re-revert. The opening of that paragraph now reads "RFC 4180 proposes a specification for the CSV format, and this is the definition commonly used. As a result, in practice ..."; "as a result" of what? It implies that because the RFC proposes a definition in practice it is ignored, something I'm sure we both agree is wrong. I'm recasting the whole opening of the paragraph to make the ambiguities clearer. Regards, Martin of Sheffield ( talk) 15:13, 20 January 2019 (UTC)
'@ Martin of Sheffield:, my apologies for not reading the change log closely enough. Your rewording looks very good. Thanks! Christopher Rath ( talk) 02:46, 22 January 2019 (UTC)
The section "Application support" says, "Many utility programs on Unix-style systems can operate on CSV files", then lists as examples cut, paste, join, sort, uniq, emacs, awk. With the exception of emacs, I think that is incorrect. Those programs can of course split a string with a comma separator, but I believe they cannot natively handle commas within quotation marks, which for me is what qualifies it as a CSV parser. At the least it would need to be amended to "... can operate on some CSV files", but I would rather remove it altogether. Adpete ( talk) 22:59, 3 March 2019 (UTC)
There's this odd sentence in the section General functionality:
Similarly, CSV cannot naturally represent hierarchical or object-oriented databases or other data.
I think the highlighted words should be deleted, or is there some special meaning there? BroVic ( talk) 08:22, 8 March 2019 (UTC)