This is the
talk page for discussing improvements to the
Sparse file article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
If anyone would like to edit this page, more info can be found here. I think the information on that site is more clear to me (probably the illustrations helped a lot to make it clear to me, maybe someone could make an illustration... -- Bernard François 17:43, 11 June 2006 (UTC)
I have always found sparse files to be more trouble than they are worth, personally. If you would like to see a defense of them (possible legitimate applications of them), try this link: http://www.cs.wisc.edu/~thain/library/sparse.pdf Timothy Andux-Jones 15:39, 26 March 2007 (UTC) I removed the link to that PDF from the article because that's not the same kind of sparse file. —Preceding unsigned comment added by Chekholko ( talk • contribs) 02:18, 28 November 2007 (UTC)
I believe the explanation here is very clear. One read and i knew what it was. By the way, sparse files are used on any Unix and linux (like in lastlog). Paul Cobbaut 20:20, 27 July 2007 (UTC)
Thanks, -- Abdull ( talk) 21:13, 8 February 2008 (UTC)
I had a little prod in Google to try to find a better method for detecting the sparse files. Nothing came up after three queries, I probably didn't have the right expression on my face at that moment. I imagine that it would be fairly easy to write such a program although that has nothing to do with Wikipedia of course. I know you were already thinking it, but hey it could be useful right?
Why could it be useful?
Welllllll to represent this in the article we would have to ditch the "advantages/disadvantages" and instead have "benefits/criticisms" or suchlike. Consider if you will rsync. Rsync and programs like it will flesh the file out to its full size before transferring it, which results in a lot of wasted bandwidth. Just a thought. These links may be of some use, although I doubt that they could be useful as sources per se: http://www.ntfs.com/ntfs-sparse.htm http://kerneltrap.org/mailarchive/openbsd-misc/2007/11/9/398477
I think that the whole idea of sparse files in and of itself smells a lot like filesystem compression. Definitely distinct but also probably related, no? Anyway I hope I am at least slightly helpful. Cheers. 125.236.211.165 ( talk) 07:14, 24 March 2008 (UTC)
cp
uses. It also requires a writing application to request holes, and it's typically faster than writing actual zero blocks; plus, filling in holes tends to increase fragmentation. But at the abstraction level between filesystem and file, this is still fundamentally compression.
ddawson (
talk) 13:08, 4 April 2009 (UTC)Being a bit pedantic but saying sparse files cannot be memory mapped on Windows is incorrect. There is a Microsoft blogger source that discusses the quirks involved
[1]. It is far more challenging to find a source of information discussing sparse memory mapped executables to see if they are generally supported (ie. if an application memory mapped a sparse executable then started executing it). There is no evidence that I can locate suggesting it is impossible. Since it is impossible to know which regions are sparse, the OS may only aware of the overall size on disk vs the reserved sparse size so, at a minimum, holes could occur in the middle of the executable; however, I am not sure why this would be an issue since machine code (at least on x86/x64 PC processors) for 0x00
repeat is (I think) ADD BYTE PTR [eax], al
(swap eax with rax for x64). Many other processors use null opcodes as noops
[2]. Even if that were a concern, streaming the file from disk any other way offers no advantage unless the OS implementation is somehow aware of which regions are sparse??
99.251.145.217 (
talk) 16:53, 11 March 2016 (UTC)
References
The method given for detecting sparse files in Unix is not quite correct. It states, "Sparse files have different apparent and actual file sizes." While true, this doesn't help; a moment's reflection should help one realize this is also true for most non-sparse files, as any time a file doesn't fill a whole number of blocks, the allocated size will greater by at least the unused number of bytes (and greater yet when indirect blocks are involved). I guess what it should say is that the apparent size of a sparse file is (typically) larger than the allocated size, and that such a condition is a reliable indicator. There are borderline cases involving very small holes and indirect blocks where a sparse file would not be detected as sparse, but I expect those are rare and not important for most uses.
Of course, for FSs without inodes, things will probably be a little different. ddawson ( talk) 15:10, 4 April 2009 (UTC)
I'm pretty sure that cp --sparse=always is linux- or GNU-coreutils-specific, my FreeBSD 7.x servers don't have this option. Lamontcg ( talk) 18:10, 23 August 2009 (UTC)
Sparse files have a long history in Unix. GNU tar first supported sparse files in 1990 in version 1.09 [1]. Clearly filesystems must have implemented sparse files before then. —Preceding unsigned comment added by Lamontcg ( talk • contribs) 18:17, 23 August 2009 (UTC)
Should we also mention these are known as file holes? (Understanding Linux Kernel by Cesati mentions file holes instead)... it took me a while to figure out file holes cause sparse files —Preceding unsigned comment added by 99.162.148.199 ( talk) 06:26, 2 June 2010 (UTC)
"Backups will hang trying to allocate 1.2 Terabytes of space to backup your last log, taking days to track down the actual issue. It seems a poor trade off to use these unsupported files for lastlog when it used to work fine without them." 121.44.109.61 ( talk) 12:30, 19 August 2010 (UTC)
According to http://gergap.wordpress.com/2013/08/10/rsync-and-sparse-files/ (among others), one way to rsync with sparse file preservation is to use two passes:
1. Create new sparse files: rsync --sparse --ignore-existing
2. Update files, preserving or adding sparseness: rsync --inplace
132.239.154.77 ( talk) 19:31, 7 November 2014 (UTC) BobC
While the section "Pipelining" may be useful to some, it should probably be disclaimed as being highly non-portable, as the /proc filesystem cannot be counted on across platforms (in fact Linux is the only one I'm aware of where the example might work), compared to the plethora of systems with sparse file support. The example "cat sparsefile|cp --sparse=always /dev/fd0 newsparsefile" might be more generically applicable.
66.68.16.215 ( talk) 02:47, 17 November 2014 (UTC) DG
The "Detection" section talks of a -k option to ls and claims that it shows the apparent size in blocks. In all versions I know, notably BSD and Linux, this sets the block size to 1 kB. I've removed it. If it goes back, it should be with an indication of where it might work.
The -h option works in Linux, but not in BSD. In this case it's not clear what use it is anyway ("Human"-readable output chooses its humans).
Finally, the --block-size option to du is also non-standard. It works for Linux, not for BSD. Somebody should fix the description. Groogle ( talk) 04:24, 7 July 2015 (UTC)
Your statement doesn't correspond to any published sources; to refresh your memory, it was established in the ls topic that Sun and FreeBSD copied from GNU in this case. TEDickey ( talk) 00:43, 8 July 2015 (UTC)
If you want to provide a WP:RS, that might be interesting. So far, you have not, relying instead on personal attacks to make the bulk of your comment. TEDickey ( talk) 00:28, 9 July 2015 (UTC)
You have provided no reliable source. If you had, you would provide a URL to a published, verifiable document. Further, in each of your responses you contradict previous statements of yours. TEDickey ( talk) 00:06, 10 July 2015 (UTC)
There are multiple references to 'disk' which should instead be 'storage medium' or similar, a terminology accuracy flaw. Data can be stored on a variety of types of media, with disk being only one category. So I suggest replacing disk with storage media or medium as appropriate throughout the article.
I recognize that misuse of the word 'disk' is pervasive, but that doesn't legitimize incorrect terminology, especially in Wikipedia.
Comments please. If no objections arise I'll revise the article accordingly within roughly two weeks if no other editor has done so by then. Cheers! -- H Bruce Campbell ( talk) 22:46, 27 January 2022 (UTC)
This edit deletes parts of two sentences. - Privat2011 ( talk) 08:32, 23 October 2022 (UTC)
This is the
talk page for discussing improvements to the
Sparse file article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google ( books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
If anyone would like to edit this page, more info can be found here. I think the information on that site is more clear to me (probably the illustrations helped a lot to make it clear to me, maybe someone could make an illustration... -- Bernard François 17:43, 11 June 2006 (UTC)
I have always found sparse files to be more trouble than they are worth, personally. If you would like to see a defense of them (possible legitimate applications of them), try this link: http://www.cs.wisc.edu/~thain/library/sparse.pdf Timothy Andux-Jones 15:39, 26 March 2007 (UTC) I removed the link to that PDF from the article because that's not the same kind of sparse file. —Preceding unsigned comment added by Chekholko ( talk • contribs) 02:18, 28 November 2007 (UTC)
I believe the explanation here is very clear. One read and i knew what it was. By the way, sparse files are used on any Unix and linux (like in lastlog). Paul Cobbaut 20:20, 27 July 2007 (UTC)
Thanks, -- Abdull ( talk) 21:13, 8 February 2008 (UTC)
I had a little prod in Google to try to find a better method for detecting the sparse files. Nothing came up after three queries, I probably didn't have the right expression on my face at that moment. I imagine that it would be fairly easy to write such a program although that has nothing to do with Wikipedia of course. I know you were already thinking it, but hey it could be useful right?
Why could it be useful?
Welllllll to represent this in the article we would have to ditch the "advantages/disadvantages" and instead have "benefits/criticisms" or suchlike. Consider if you will rsync. Rsync and programs like it will flesh the file out to its full size before transferring it, which results in a lot of wasted bandwidth. Just a thought. These links may be of some use, although I doubt that they could be useful as sources per se: http://www.ntfs.com/ntfs-sparse.htm http://kerneltrap.org/mailarchive/openbsd-misc/2007/11/9/398477
I think that the whole idea of sparse files in and of itself smells a lot like filesystem compression. Definitely distinct but also probably related, no? Anyway I hope I am at least slightly helpful. Cheers. 125.236.211.165 ( talk) 07:14, 24 March 2008 (UTC)
cp
uses. It also requires a writing application to request holes, and it's typically faster than writing actual zero blocks; plus, filling in holes tends to increase fragmentation. But at the abstraction level between filesystem and file, this is still fundamentally compression.
ddawson (
talk) 13:08, 4 April 2009 (UTC)Being a bit pedantic but saying sparse files cannot be memory mapped on Windows is incorrect. There is a Microsoft blogger source that discusses the quirks involved
[1]. It is far more challenging to find a source of information discussing sparse memory mapped executables to see if they are generally supported (ie. if an application memory mapped a sparse executable then started executing it). There is no evidence that I can locate suggesting it is impossible. Since it is impossible to know which regions are sparse, the OS may only aware of the overall size on disk vs the reserved sparse size so, at a minimum, holes could occur in the middle of the executable; however, I am not sure why this would be an issue since machine code (at least on x86/x64 PC processors) for 0x00
repeat is (I think) ADD BYTE PTR [eax], al
(swap eax with rax for x64). Many other processors use null opcodes as noops
[2]. Even if that were a concern, streaming the file from disk any other way offers no advantage unless the OS implementation is somehow aware of which regions are sparse??
99.251.145.217 (
talk) 16:53, 11 March 2016 (UTC)
References
The method given for detecting sparse files in Unix is not quite correct. It states, "Sparse files have different apparent and actual file sizes." While true, this doesn't help; a moment's reflection should help one realize this is also true for most non-sparse files, as any time a file doesn't fill a whole number of blocks, the allocated size will greater by at least the unused number of bytes (and greater yet when indirect blocks are involved). I guess what it should say is that the apparent size of a sparse file is (typically) larger than the allocated size, and that such a condition is a reliable indicator. There are borderline cases involving very small holes and indirect blocks where a sparse file would not be detected as sparse, but I expect those are rare and not important for most uses.
Of course, for FSs without inodes, things will probably be a little different. ddawson ( talk) 15:10, 4 April 2009 (UTC)
I'm pretty sure that cp --sparse=always is linux- or GNU-coreutils-specific, my FreeBSD 7.x servers don't have this option. Lamontcg ( talk) 18:10, 23 August 2009 (UTC)
Sparse files have a long history in Unix. GNU tar first supported sparse files in 1990 in version 1.09 [1]. Clearly filesystems must have implemented sparse files before then. —Preceding unsigned comment added by Lamontcg ( talk • contribs) 18:17, 23 August 2009 (UTC)
Should we also mention these are known as file holes? (Understanding Linux Kernel by Cesati mentions file holes instead)... it took me a while to figure out file holes cause sparse files —Preceding unsigned comment added by 99.162.148.199 ( talk) 06:26, 2 June 2010 (UTC)
"Backups will hang trying to allocate 1.2 Terabytes of space to backup your last log, taking days to track down the actual issue. It seems a poor trade off to use these unsupported files for lastlog when it used to work fine without them." 121.44.109.61 ( talk) 12:30, 19 August 2010 (UTC)
According to http://gergap.wordpress.com/2013/08/10/rsync-and-sparse-files/ (among others), one way to rsync with sparse file preservation is to use two passes:
1. Create new sparse files: rsync --sparse --ignore-existing
2. Update files, preserving or adding sparseness: rsync --inplace
132.239.154.77 ( talk) 19:31, 7 November 2014 (UTC) BobC
While the section "Pipelining" may be useful to some, it should probably be disclaimed as being highly non-portable, as the /proc filesystem cannot be counted on across platforms (in fact Linux is the only one I'm aware of where the example might work), compared to the plethora of systems with sparse file support. The example "cat sparsefile|cp --sparse=always /dev/fd0 newsparsefile" might be more generically applicable.
66.68.16.215 ( talk) 02:47, 17 November 2014 (UTC) DG
The "Detection" section talks of a -k option to ls and claims that it shows the apparent size in blocks. In all versions I know, notably BSD and Linux, this sets the block size to 1 kB. I've removed it. If it goes back, it should be with an indication of where it might work.
The -h option works in Linux, but not in BSD. In this case it's not clear what use it is anyway ("Human"-readable output chooses its humans).
Finally, the --block-size option to du is also non-standard. It works for Linux, not for BSD. Somebody should fix the description. Groogle ( talk) 04:24, 7 July 2015 (UTC)
Your statement doesn't correspond to any published sources; to refresh your memory, it was established in the ls topic that Sun and FreeBSD copied from GNU in this case. TEDickey ( talk) 00:43, 8 July 2015 (UTC)
If you want to provide a WP:RS, that might be interesting. So far, you have not, relying instead on personal attacks to make the bulk of your comment. TEDickey ( talk) 00:28, 9 July 2015 (UTC)
You have provided no reliable source. If you had, you would provide a URL to a published, verifiable document. Further, in each of your responses you contradict previous statements of yours. TEDickey ( talk) 00:06, 10 July 2015 (UTC)
There are multiple references to 'disk' which should instead be 'storage medium' or similar, a terminology accuracy flaw. Data can be stored on a variety of types of media, with disk being only one category. So I suggest replacing disk with storage media or medium as appropriate throughout the article.
I recognize that misuse of the word 'disk' is pervasive, but that doesn't legitimize incorrect terminology, especially in Wikipedia.
Comments please. If no objections arise I'll revise the article accordingly within roughly two weeks if no other editor has done so by then. Cheers! -- H Bruce Campbell ( talk) 22:46, 27 January 2022 (UTC)
This edit deletes parts of two sentences. - Privat2011 ( talk) 08:32, 23 October 2022 (UTC)