![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||
|
There was also briefly some content at bunzip2: bzip2 and bunzip2 are free open-source compression utilities.
Many consider them "third-generation" compression utilities, surpassing both first-generation tools (like arc and LHA) and second-generation tools (such as the popular PKZIP and gzip formats) in compression ability; it "pays" for this extra compression with an increased computational cost. Nonetheless, with the constant effect of Moore's Law making computer time less and less important, compression methods like bzip2 have become more popular.
Of particular note is the fact that, unlike PKZIP, bzip2 is released under a very permissive license, which encourages its use in both open- and closed-source software.
bit-sequences derived from the decimal representation of pi.
/*-- A 6-byte block header, the value chosen arbitrarily as 0x314159265359 :-). A 32 bit value does not really give a strong enough guarantee that the value will not appear by chance in the compressed datastream. Worst-case probability of this event, for a 900k block, is about 2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits. For a compressed file of size 100Gb -- about 100000 blocks -- only a 48-bit marker will do. NB: normal compression/ decompression do *not* rely on these statistical properties. They are only important when trying to recover blocks from damaged files. --*/
"In GNU, bzip2 can be used combined or independently of tar"
In GNU, what the heck? No-one says "in GNU" and it should be "In Unix", anyway. Opinions? Jsalomaa 21:55, 29 August 2005 (UTC)
Just curious -- does anyone know why bzip is called bzip? -- babbage 20:40, 5 March 2006 (UTC)
Why is there information about tar that doesn't relate to bzip? Goffrie 20:37, 3 June 2006 (UTC)
There's two "Run-length encoding" sections in the article. They need to be merged.--
Father Goose 03:03, 23 April 2007 (UTC)
RUNA
and RUNB
. Based on the differing ways in which the two operate (even if the heading is the same), I believe it would be inappropriate to merge them. Note also that they two uses of the this technique are several stages apart; to merge them would produce an inaccurate reflection of the compression stack.
Sladen 09:56, 23 April 2007 (UTC)
What are the technical limitations of bzip2? what is the maximum file size, the longest possible contained filename, the maximum amount of contained files etc.? -- Loh 12:54, 24 April 2007 (UTC)
I removed the external link to the Apache BZip2 implementation becacuse it doesn't seem to be standard. It expects the data to start at byte 3, without the "BZ" magic. -- Zom-B ( talk) 18:54, 11 June 2009 (UTC)
I made a cleanup to the "compression efficiency" because the old version makes inaccurate claims (specific amount of speed compared X) and the benchmark reference is not very good one. Futhermore the technical description of magic number is irrelevant to compression and the magic number is already described in the technical section about the format, which is where it belongs. Samir000 ( talk) 20:09, 14 February 2010 (UTC)
Since the output of the second RLE is used by Huffman, it seems to me that it must already remove the zero symbol (which is instead replaced by RUNA/RUNB). Thus the text would be as follows:
RUNA
and RUNB
, which represent the run-length as a binary number greater than zero (0). Because the MTF encoding transforms a run in a single non-zero element followed by a streak of zeros, RLE is only used to compress runs of zero. Apart from this limitation, this RLE process is more flexible than the RLE of step 1, as it is able to encode arbitrarily long integers (in practice, this is usually limited by the block size, so that this step does not encode a run of more than 900000 bytes). The run-length is encoded in this fashion: assigning place values of 1 to the first bit, 2 to the second, 4 to the third, etc. in the RUNA/RUNB sequence, multiply each place value in a RUNB spot by 2, and add all the resulting place values (for RUNA and RUNB values alike) together. Thus, the sequence RUNB, RUNA results in the value (1*2 + 2) = 4, the sequence 0,0,0,0,1
would be represented as RUNB,RUNA,1
. As a more complicated example:
RUNA RUNB RUNA RUNA RUNB (ABAAB)
1 2 4 8 16
1 4 4 8 32 = 49
0: RUNA
1: RUNB
2-n+1: byte values 1-n (if n is not zero)
n+2: end of stream, finish processing. (could be as low as 2).
Another change is in the final line: since the 0 value is not represented, the byte values there are 1-255, and the end of stream is represented by a value between 2 and 257 (not 258). Can anyone confirm and/or make the changes? Balabiot ( talk) 07:45, 1 March 2010 (UTC)
Article says "LZMA is generally more space-efficient than bzip2 at the expense of slower compression speed, while having much faster decompression." This is based on these benchmarks: http://compressionratings.com/comp.cgi?7-zip+9.12b++bzip2+1.0.5++gzip+1.3.3+-5 But in these benchmarks only highest compression level(-9) of LZMA was tested. This benchmarks( http://tukaani.org/lzma/benchmarks.html) show that LZMA with different settings(-1 or -2) can be comparable to or more efficient than bzip2 in both compression speed and space-efficiency. 213.155.215.214 ( talk) 19:32, 26 February 2012 (UTC)
The file format section states for the "contents" part that it is "max. 7372800 bit". This would translate to 900 * 1024 * 8. However, the bzlib.c calculates the number of the buffer to be: 1000 * blockSize100k. It looks like therefore this might be an error in the Wiki article, i.e., kilo and kilo binary were confused for each other. Therefore, I think it should read "max. 7200000 bit". However, I'm not 100% sure. If anyone could check this, please? Maxiantor ( talk) 15:19, 19 November 2019 (UTC)
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||
|
There was also briefly some content at bunzip2: bzip2 and bunzip2 are free open-source compression utilities.
Many consider them "third-generation" compression utilities, surpassing both first-generation tools (like arc and LHA) and second-generation tools (such as the popular PKZIP and gzip formats) in compression ability; it "pays" for this extra compression with an increased computational cost. Nonetheless, with the constant effect of Moore's Law making computer time less and less important, compression methods like bzip2 have become more popular.
Of particular note is the fact that, unlike PKZIP, bzip2 is released under a very permissive license, which encourages its use in both open- and closed-source software.
bit-sequences derived from the decimal representation of pi.
/*-- A 6-byte block header, the value chosen arbitrarily as 0x314159265359 :-). A 32 bit value does not really give a strong enough guarantee that the value will not appear by chance in the compressed datastream. Worst-case probability of this event, for a 900k block, is about 2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits. For a compressed file of size 100Gb -- about 100000 blocks -- only a 48-bit marker will do. NB: normal compression/ decompression do *not* rely on these statistical properties. They are only important when trying to recover blocks from damaged files. --*/
"In GNU, bzip2 can be used combined or independently of tar"
In GNU, what the heck? No-one says "in GNU" and it should be "In Unix", anyway. Opinions? Jsalomaa 21:55, 29 August 2005 (UTC)
Just curious -- does anyone know why bzip is called bzip? -- babbage 20:40, 5 March 2006 (UTC)
Why is there information about tar that doesn't relate to bzip? Goffrie 20:37, 3 June 2006 (UTC)
There's two "Run-length encoding" sections in the article. They need to be merged.--
Father Goose 03:03, 23 April 2007 (UTC)
RUNA
and RUNB
. Based on the differing ways in which the two operate (even if the heading is the same), I believe it would be inappropriate to merge them. Note also that they two uses of the this technique are several stages apart; to merge them would produce an inaccurate reflection of the compression stack.
Sladen 09:56, 23 April 2007 (UTC)
What are the technical limitations of bzip2? what is the maximum file size, the longest possible contained filename, the maximum amount of contained files etc.? -- Loh 12:54, 24 April 2007 (UTC)
I removed the external link to the Apache BZip2 implementation becacuse it doesn't seem to be standard. It expects the data to start at byte 3, without the "BZ" magic. -- Zom-B ( talk) 18:54, 11 June 2009 (UTC)
I made a cleanup to the "compression efficiency" because the old version makes inaccurate claims (specific amount of speed compared X) and the benchmark reference is not very good one. Futhermore the technical description of magic number is irrelevant to compression and the magic number is already described in the technical section about the format, which is where it belongs. Samir000 ( talk) 20:09, 14 February 2010 (UTC)
Since the output of the second RLE is used by Huffman, it seems to me that it must already remove the zero symbol (which is instead replaced by RUNA/RUNB). Thus the text would be as follows:
RUNA
and RUNB
, which represent the run-length as a binary number greater than zero (0). Because the MTF encoding transforms a run in a single non-zero element followed by a streak of zeros, RLE is only used to compress runs of zero. Apart from this limitation, this RLE process is more flexible than the RLE of step 1, as it is able to encode arbitrarily long integers (in practice, this is usually limited by the block size, so that this step does not encode a run of more than 900000 bytes). The run-length is encoded in this fashion: assigning place values of 1 to the first bit, 2 to the second, 4 to the third, etc. in the RUNA/RUNB sequence, multiply each place value in a RUNB spot by 2, and add all the resulting place values (for RUNA and RUNB values alike) together. Thus, the sequence RUNB, RUNA results in the value (1*2 + 2) = 4, the sequence 0,0,0,0,1
would be represented as RUNB,RUNA,1
. As a more complicated example:
RUNA RUNB RUNA RUNA RUNB (ABAAB)
1 2 4 8 16
1 4 4 8 32 = 49
0: RUNA
1: RUNB
2-n+1: byte values 1-n (if n is not zero)
n+2: end of stream, finish processing. (could be as low as 2).
Another change is in the final line: since the 0 value is not represented, the byte values there are 1-255, and the end of stream is represented by a value between 2 and 257 (not 258). Can anyone confirm and/or make the changes? Balabiot ( talk) 07:45, 1 March 2010 (UTC)
Article says "LZMA is generally more space-efficient than bzip2 at the expense of slower compression speed, while having much faster decompression." This is based on these benchmarks: http://compressionratings.com/comp.cgi?7-zip+9.12b++bzip2+1.0.5++gzip+1.3.3+-5 But in these benchmarks only highest compression level(-9) of LZMA was tested. This benchmarks( http://tukaani.org/lzma/benchmarks.html) show that LZMA with different settings(-1 or -2) can be comparable to or more efficient than bzip2 in both compression speed and space-efficiency. 213.155.215.214 ( talk) 19:32, 26 February 2012 (UTC)
The file format section states for the "contents" part that it is "max. 7372800 bit". This would translate to 900 * 1024 * 8. However, the bzlib.c calculates the number of the buffer to be: 1000 * blockSize100k. It looks like therefore this might be an error in the Wiki article, i.e., kilo and kilo binary were confused for each other. Therefore, I think it should read "max. 7200000 bit". However, I'm not 100% sure. If anyone could check this, please? Maxiantor ( talk) 15:19, 19 November 2019 (UTC)