This article is within the scope of WikiProject Digital Preservation, a collaborative effort to improve the coverage of
digital preservation on Wikipedia. If you would like to participate, please visit the project page, where you can join
the discussion and see a list of open tasks.Digital PreservationWikipedia:WikiProject Digital PreservationTemplate:WikiProject Digital PreservationDigital Preservation articles
This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of
computers,
computing, and
information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join
the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Reviewing the IETF draft and this article I found some aspects that may be worth to mention:
Limitations of BagIt
There is no registry of checksum algorithms and their abbreviations as used for manifest files
A tag manifest file cannot contain its own file name (
Hochstenbach this is a general remark - any manifest file cannot contain its own manifest); (
Tibaut Houzanme This is to explain the earlier remark: meaning, one cannot build the house one is born in. One file has to be created, before another file can contain its name as part of manifest list. However, a remedy would be that the manifest text file can be made to contain a Hash of the text or string that makes up the manifest list itself. Then when a manifest is modified, not only the manifest' Hash will be different, but also the string of text that make it up will also be different. Would this be necessary? Is the real question. And I posit, the data checksum, the manifest checksum and the metadata checksum together is sufficient.)
Specification issues
It is not made clear, whether order of tags in tag files may be relevant. I suggest to explicitly state in the specification if order is not relevant
It is not made clear whether tags may be repeated (in this case order may be relevant)
Speaking about tag files: There is no common tag file format. Section 4.2 describes a key/value-format, but only for bag-info.txt. I'd call it a design error of BagIt to already have two tag file formats (space-separated as in manifest files and fetch.txt, and key/value-format). For additional tag files, not mentioned in the current draft, you know nothing but the character encoding, it could be any format.
Tag/metadata values cannot include newline characters - on the other hand whitespace is considered as part of the value. Does line folding change the value or not?
The general form of tag/metadata labels is not specified. Can they include spaces and non ASCII-characters, such as umlauts? Obviously they cannot include colons.
It is probably worth noting that issues with the specification are best discussed in the
digital-curation Google Group instead of on Wikipedia proper. The people who are responsible for editing the specification probably aren't paying attention to this talk page.
Edsu (
talk)
16:53, 14 October 2010 (UTC)reply
Related systems
BagIt reminds me on
distributed revision control systems. A bag looks like a snapshot of a RCS, or a RCS repository with one revision only. The article should contain some references to similar systems, maybe RCS can be mentioned (among others)
I removed the following sentence because it added no information: "Once a bag is received, verified and placed in storage, the manifest can be used again in the future to verify that the integrity of the bag remains intact."
Edsu removed the sentence "In practice “bagit.txt” only contains characters that are also part of plain
ASCII, and the most common character encoding for tag files is UTF-8.". But IETF draft says
"The "bagit.txt" file should consist of exactly two lines,
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
Unless the "M.N" part or the "UTF-8" part include non-ASCII characters (which I strongly doubt), the encoding of bagit.txt only contains plain ASCII characters - beside the optional
BOM header, that I forgot to mention. Anyway the existence of BOM headers should be mentioned in the specification.
I put the sentence about bagit.txt character encoding back, indicating that it must be UTF-8. Distinguishing between ASCII and UTF-8 seems out of scope for this article. I initially confused this sentence with talking about the encoding of bag-info.txt instead of bagit.txt, since it was also talking about tag files.
Wikipedia requires
reliable independent secondary sources. It's grwat that people are enthused by BagIt, but that doesn't give any kind of pass on this. Long primary-sourced lists of users of a thing are hallmarks of promotional editing and not part of an encyclopaedia. Guy (
Help!)
13:01, 9 November 2018 (UTC)reply
This article is within the scope of WikiProject Digital Preservation, a collaborative effort to improve the coverage of
digital preservation on Wikipedia. If you would like to participate, please visit the project page, where you can join
the discussion and see a list of open tasks.Digital PreservationWikipedia:WikiProject Digital PreservationTemplate:WikiProject Digital PreservationDigital Preservation articles
This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of
computers,
computing, and
information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join
the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Reviewing the IETF draft and this article I found some aspects that may be worth to mention:
Limitations of BagIt
There is no registry of checksum algorithms and their abbreviations as used for manifest files
A tag manifest file cannot contain its own file name (
Hochstenbach this is a general remark - any manifest file cannot contain its own manifest); (
Tibaut Houzanme This is to explain the earlier remark: meaning, one cannot build the house one is born in. One file has to be created, before another file can contain its name as part of manifest list. However, a remedy would be that the manifest text file can be made to contain a Hash of the text or string that makes up the manifest list itself. Then when a manifest is modified, not only the manifest' Hash will be different, but also the string of text that make it up will also be different. Would this be necessary? Is the real question. And I posit, the data checksum, the manifest checksum and the metadata checksum together is sufficient.)
Specification issues
It is not made clear, whether order of tags in tag files may be relevant. I suggest to explicitly state in the specification if order is not relevant
It is not made clear whether tags may be repeated (in this case order may be relevant)
Speaking about tag files: There is no common tag file format. Section 4.2 describes a key/value-format, but only for bag-info.txt. I'd call it a design error of BagIt to already have two tag file formats (space-separated as in manifest files and fetch.txt, and key/value-format). For additional tag files, not mentioned in the current draft, you know nothing but the character encoding, it could be any format.
Tag/metadata values cannot include newline characters - on the other hand whitespace is considered as part of the value. Does line folding change the value or not?
The general form of tag/metadata labels is not specified. Can they include spaces and non ASCII-characters, such as umlauts? Obviously they cannot include colons.
It is probably worth noting that issues with the specification are best discussed in the
digital-curation Google Group instead of on Wikipedia proper. The people who are responsible for editing the specification probably aren't paying attention to this talk page.
Edsu (
talk)
16:53, 14 October 2010 (UTC)reply
Related systems
BagIt reminds me on
distributed revision control systems. A bag looks like a snapshot of a RCS, or a RCS repository with one revision only. The article should contain some references to similar systems, maybe RCS can be mentioned (among others)
I removed the following sentence because it added no information: "Once a bag is received, verified and placed in storage, the manifest can be used again in the future to verify that the integrity of the bag remains intact."
Edsu removed the sentence "In practice “bagit.txt” only contains characters that are also part of plain
ASCII, and the most common character encoding for tag files is UTF-8.". But IETF draft says
"The "bagit.txt" file should consist of exactly two lines,
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
Unless the "M.N" part or the "UTF-8" part include non-ASCII characters (which I strongly doubt), the encoding of bagit.txt only contains plain ASCII characters - beside the optional
BOM header, that I forgot to mention. Anyway the existence of BOM headers should be mentioned in the specification.
I put the sentence about bagit.txt character encoding back, indicating that it must be UTF-8. Distinguishing between ASCII and UTF-8 seems out of scope for this article. I initially confused this sentence with talking about the encoding of bag-info.txt instead of bagit.txt, since it was also talking about tag files.
Wikipedia requires
reliable independent secondary sources. It's grwat that people are enthused by BagIt, but that doesn't give any kind of pass on this. Long primary-sourced lists of users of a thing are hallmarks of promotional editing and not part of an encyclopaedia. Guy (
Help!)
13:01, 9 November 2018 (UTC)reply