![]() | This article was nominated for deletion on 6 August 2009. The result of the discussion was Move to Comparison of data serialization formats. |
![]() | This article is rated List-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
This content was a long list in the main XML article. I removed the list to put it here, because the XML article is already very long. Hervegirod ( talk) 00:45, 6 August 2009 (UTC)
I don't think this article should be deleted if it is still up for deletion. There are lots of these articles, they are useful for finding out information and comparing different things quickly.
SeanJA ( talk) 05:31, 12 September 2009 (UTC)
A common term for "data serialization format" is encoding. You may want to include in this comparison:
I second the inclusion of XDR
Jann.poppinga (
talk)
10:57, 19 March 2010 (UTC)
Shouldn't Boost Serialization be included here? — Preceding unsigned comment added by 98.171.183.235 ( talk) 00:54, 25 October 2022 (UTC)
This section should be on the XML page...
![]() |
References
XML should only be tagged as partially human-readable, the simpler XML files, basic XML files can be, but onces xmlns and xsd come into play, it quickly becomes not human-readable. Another factor is that it's not always possible to properly reformat/indent XML for readability without affecting content. 81.220.246.44 ( talk) 14:20, 24 October 2014 (UTC)
Concerning the "not human-readable" implements mentioned above, XML namespace specification attributes ("xmlns") and XML schema definitions (XSDs) are text just like XML, and perfectly human-readable. And you absolutely can reformat/indent XML w/out affecting content; that/explicit value delimitation via tag/attrib is the whole point/benefit over whitespace-delimited encoders like YAML (1 detriment of which is the negative effect of improper/varied indentation). — Preceding unsigned comment added by 192.91.171.42 ( talk) 17:11, 17 January 2020 (UTC)
The JSON associative array sample - {42: true, "A to Z": [1, 2, 3]} - looks wrong to me (and also to JSONLint). In JSON the property names ("keys" if you will) must be double-quoted strings. Neither numbers nor unquoted strings are valid, hence 42 cannot be a property name, although "A to Z" can, as can "42".
See JSON.org, as follows:
-- Mikepeat ( talk) 15:30, 10 January 2011 (UTC)
Especially for the subsection about "binary formats", but also for the "Overview", I would expect some information about Apache Avro: http://en.wikipedia.org/wiki/Apache_Avro ... till now I don't have enough own knowledge to write something about it -- 217.24.206.242 ( talk) 11:10, 18 September 2012 (UTC)
As I understand the intention of this article, Java Serialization should be a part of it. It is one of the commonly used object serialization formats (e.g. for RMI communication). — Preceding unsigned comment added by 217.18.178.110 ( talk) 12:52, 20 June 2014 (UTC)
As I understand the suggestion has been made that Java Serialization should be part of this article, what about other language-specific serialization formats, such as Python's pickle? 195.212.29.89 ( talk) 07:00, 25 September 2014 (UTC)
https://github.com/Microsoft/bond/
From the github: "Bond is a cross-platform framework for working with schematized data. It supports cross-language de/serialization and powerful generic mechanisms for efficiently manipulating data. Bond is broadly used at Microsoft in high scale services." — Preceding unsigned comment added by 82.136.100.19 ( talk) 10:11, 30 January 2015 (UTC)
To be complete, FlatBuffers ( http://google.github.io/flatbuffers/) and Cap'N Proto ( https://capnproto.org/) could be mentiond. 128.237.28.16 ( talk) 16:00, 24 February 2015 (UTC)
Agree. Cap'n Proto is very interesting and would be a good comparison. I do not have enough in-depth knowledge to create an official entry. CaliViking ( talk) 18:33, 29 September 2022 (UTC)
The term Standardized leads to a page describing National and International Standards. Many of the of the entries are misleadingly listed as "Standardized" when in fact they are not standardized protocols, never having been approved by a due-process ANSI or ISO approved standards development organization. For example, Apache is not an ANSI or ISO approved standards development organization and therefore Avro is not a standardized protocol unless it is submitted and approved by such a body. — Preceding unsigned comment added by Posicks ( talk • contribs) 15:45, 9 May 2015 (UTC)
Something is standardized if a useful specification is publicly aviable. -- 195.14.219.99 ( talk) 21:01, 3 November 2015 (UTC)
The same can be said for Protocol Buffers - the link just refers to its own documentation. — Preceding unsigned comment added by 148.80.255.144 ( talk) 20:44, 23 February 2018 (UTC)
Created for Go programming language. https://golang.org/pkg/encoding/gob/
Created as a simple way to serialize (Python's) Numpy objects. https://www.numpy.org/devdocs/reference/generated/numpy.lib.format.html
https://github.com/edn-format/edn — Preceding unsigned comment added by 164.144.252.29 ( talk) 18:56, 11 November 2015 (UTC)
Looks very interesting. I don't have the in-depth knowledge to write a good entry. Creator of the format is very knowledgeable in the field as the primary author of Proto Buffers version 2. See also https://capnproto.org/ , https://github.com/capnproto/ , https://groups.google.com/g/capnproto CaliViking ( talk) 18:33, 29 September 2022 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Comparison of data serialization formats. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 16:54, 11 August 2017 (UTC)
Shouldn't "JSON Schema" be added as a schema spec language for YAML?
Arnauld ( talk) 11:32, 19 March 2019 (UTC)
Add swagger a comparator 148.80.255.166 ( talk) 13:31, 29 April 2019 (UTC)
I have added a row for HJSON, which an IP removed. The reasoning is that we don’t include human-readable “user interface” formats, but we already have YAML here, which is a human-readable superset of JSON.
Not having HJSON on this page makes the Wikipedia a worse place, because it deprives its readers from knowing about a perfectly useful serialization format which has some features JSON lacks (comments, multi-line strings) while not having the complexity of YAML. As per WP:NNC, this content should be restored. No, I have no relation to HJSON, except that it’s a useful data format I wish I knew about earlier. Samboy ( talk) 19:56, 13 August 2019 (UTC)
for data representation you can pick one of the following: YAML, YAMLEX, JSON, JSON5, HJSON, or even pure PythonSamboy ( talk) 20:27, 13 August 2019 (UTC)
Okay, there are a lot of formats out there. I open this section to collect and reference them, and if they reach the notability threshold they can be included in the main article. I'm sure there's plenty. Also there is some discrepancy/overlapping between data-serialization formats, data exchange formats and configuration file formats, I do not make distinction here since - in my opinion - they are mostly the same set with very similar purposes and only slightly specificattributes. -- grin ✎ 09:39, 20 February 2020 (UTC)
In what way is YAML not standardized? ---- Cowlinator ( talk) 16:02, 16 February 2021 (UTC)
Hi,
Other media types like images, audio and video are data too. The main difference might be that they usually are binary encoded, but that's okay since there are plenty of binary data formats already in the article Comparison of data-serialization formats.
I suggest referring to Media types as they're standardized.
Have a nice day :) Dun Nic ( talk) 19:22, 14 October 2022 (UTC)
Hi, I' missing a key characteristic (at least, it's key to me).
Lacking a better name for it, we can refer to it as streaming. A data serialization format supporting streaming would mean that it supports a unlimited amount of items in one data stream.
For instance:
Have a nice day :) Dun Nic ( talk) 19:39, 14 October 2022 (UTC)
RDF seems legit to me :) Dun Nic ( talk) 19:39, 14 October 2022 (UTC)
There is also the PostScript binary format. It has the advantage that you might not need to parse all of the data to find something; each part contains the address of the sub-parts. However, it also has disadvantages such as lack of 64-bit integers, and strings cannot exceed 64K. -- Zzo38 ( talk) 01:34, 6 April 2024 (UTC)
![]() | This article was nominated for deletion on 6 August 2009. The result of the discussion was Move to Comparison of data serialization formats. |
![]() | This article is rated List-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
This content was a long list in the main XML article. I removed the list to put it here, because the XML article is already very long. Hervegirod ( talk) 00:45, 6 August 2009 (UTC)
I don't think this article should be deleted if it is still up for deletion. There are lots of these articles, they are useful for finding out information and comparing different things quickly.
SeanJA ( talk) 05:31, 12 September 2009 (UTC)
A common term for "data serialization format" is encoding. You may want to include in this comparison:
I second the inclusion of XDR
Jann.poppinga (
talk)
10:57, 19 March 2010 (UTC)
Shouldn't Boost Serialization be included here? — Preceding unsigned comment added by 98.171.183.235 ( talk) 00:54, 25 October 2022 (UTC)
This section should be on the XML page...
![]() |
References
XML should only be tagged as partially human-readable, the simpler XML files, basic XML files can be, but onces xmlns and xsd come into play, it quickly becomes not human-readable. Another factor is that it's not always possible to properly reformat/indent XML for readability without affecting content. 81.220.246.44 ( talk) 14:20, 24 October 2014 (UTC)
Concerning the "not human-readable" implements mentioned above, XML namespace specification attributes ("xmlns") and XML schema definitions (XSDs) are text just like XML, and perfectly human-readable. And you absolutely can reformat/indent XML w/out affecting content; that/explicit value delimitation via tag/attrib is the whole point/benefit over whitespace-delimited encoders like YAML (1 detriment of which is the negative effect of improper/varied indentation). — Preceding unsigned comment added by 192.91.171.42 ( talk) 17:11, 17 January 2020 (UTC)
The JSON associative array sample - {42: true, "A to Z": [1, 2, 3]} - looks wrong to me (and also to JSONLint). In JSON the property names ("keys" if you will) must be double-quoted strings. Neither numbers nor unquoted strings are valid, hence 42 cannot be a property name, although "A to Z" can, as can "42".
See JSON.org, as follows:
-- Mikepeat ( talk) 15:30, 10 January 2011 (UTC)
Especially for the subsection about "binary formats", but also for the "Overview", I would expect some information about Apache Avro: http://en.wikipedia.org/wiki/Apache_Avro ... till now I don't have enough own knowledge to write something about it -- 217.24.206.242 ( talk) 11:10, 18 September 2012 (UTC)
As I understand the intention of this article, Java Serialization should be a part of it. It is one of the commonly used object serialization formats (e.g. for RMI communication). — Preceding unsigned comment added by 217.18.178.110 ( talk) 12:52, 20 June 2014 (UTC)
As I understand the suggestion has been made that Java Serialization should be part of this article, what about other language-specific serialization formats, such as Python's pickle? 195.212.29.89 ( talk) 07:00, 25 September 2014 (UTC)
https://github.com/Microsoft/bond/
From the github: "Bond is a cross-platform framework for working with schematized data. It supports cross-language de/serialization and powerful generic mechanisms for efficiently manipulating data. Bond is broadly used at Microsoft in high scale services." — Preceding unsigned comment added by 82.136.100.19 ( talk) 10:11, 30 January 2015 (UTC)
To be complete, FlatBuffers ( http://google.github.io/flatbuffers/) and Cap'N Proto ( https://capnproto.org/) could be mentiond. 128.237.28.16 ( talk) 16:00, 24 February 2015 (UTC)
Agree. Cap'n Proto is very interesting and would be a good comparison. I do not have enough in-depth knowledge to create an official entry. CaliViking ( talk) 18:33, 29 September 2022 (UTC)
The term Standardized leads to a page describing National and International Standards. Many of the of the entries are misleadingly listed as "Standardized" when in fact they are not standardized protocols, never having been approved by a due-process ANSI or ISO approved standards development organization. For example, Apache is not an ANSI or ISO approved standards development organization and therefore Avro is not a standardized protocol unless it is submitted and approved by such a body. — Preceding unsigned comment added by Posicks ( talk • contribs) 15:45, 9 May 2015 (UTC)
Something is standardized if a useful specification is publicly aviable. -- 195.14.219.99 ( talk) 21:01, 3 November 2015 (UTC)
The same can be said for Protocol Buffers - the link just refers to its own documentation. — Preceding unsigned comment added by 148.80.255.144 ( talk) 20:44, 23 February 2018 (UTC)
Created for Go programming language. https://golang.org/pkg/encoding/gob/
Created as a simple way to serialize (Python's) Numpy objects. https://www.numpy.org/devdocs/reference/generated/numpy.lib.format.html
https://github.com/edn-format/edn — Preceding unsigned comment added by 164.144.252.29 ( talk) 18:56, 11 November 2015 (UTC)
Looks very interesting. I don't have the in-depth knowledge to write a good entry. Creator of the format is very knowledgeable in the field as the primary author of Proto Buffers version 2. See also https://capnproto.org/ , https://github.com/capnproto/ , https://groups.google.com/g/capnproto CaliViking ( talk) 18:33, 29 September 2022 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Comparison of data serialization formats. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 16:54, 11 August 2017 (UTC)
Shouldn't "JSON Schema" be added as a schema spec language for YAML?
Arnauld ( talk) 11:32, 19 March 2019 (UTC)
Add swagger a comparator 148.80.255.166 ( talk) 13:31, 29 April 2019 (UTC)
I have added a row for HJSON, which an IP removed. The reasoning is that we don’t include human-readable “user interface” formats, but we already have YAML here, which is a human-readable superset of JSON.
Not having HJSON on this page makes the Wikipedia a worse place, because it deprives its readers from knowing about a perfectly useful serialization format which has some features JSON lacks (comments, multi-line strings) while not having the complexity of YAML. As per WP:NNC, this content should be restored. No, I have no relation to HJSON, except that it’s a useful data format I wish I knew about earlier. Samboy ( talk) 19:56, 13 August 2019 (UTC)
for data representation you can pick one of the following: YAML, YAMLEX, JSON, JSON5, HJSON, or even pure PythonSamboy ( talk) 20:27, 13 August 2019 (UTC)
Okay, there are a lot of formats out there. I open this section to collect and reference them, and if they reach the notability threshold they can be included in the main article. I'm sure there's plenty. Also there is some discrepancy/overlapping between data-serialization formats, data exchange formats and configuration file formats, I do not make distinction here since - in my opinion - they are mostly the same set with very similar purposes and only slightly specificattributes. -- grin ✎ 09:39, 20 February 2020 (UTC)
In what way is YAML not standardized? ---- Cowlinator ( talk) 16:02, 16 February 2021 (UTC)
Hi,
Other media types like images, audio and video are data too. The main difference might be that they usually are binary encoded, but that's okay since there are plenty of binary data formats already in the article Comparison of data-serialization formats.
I suggest referring to Media types as they're standardized.
Have a nice day :) Dun Nic ( talk) 19:22, 14 October 2022 (UTC)
Hi, I' missing a key characteristic (at least, it's key to me).
Lacking a better name for it, we can refer to it as streaming. A data serialization format supporting streaming would mean that it supports a unlimited amount of items in one data stream.
For instance:
Have a nice day :) Dun Nic ( talk) 19:39, 14 October 2022 (UTC)
RDF seems legit to me :) Dun Nic ( talk) 19:39, 14 October 2022 (UTC)
There is also the PostScript binary format. It has the advantage that you might not need to parse all of the data to find something; each part contains the address of the sub-parts. However, it also has disadvantages such as lack of 64-bit integers, and strings cannot exceed 64K. -- Zzo38 ( talk) 01:34, 6 April 2024 (UTC)