In preparation to release version 0.5 (which has grown to a whopping 979mb’s), I decided to find the best compression format for our toolkit. I normally use 7-zip and Winrar, but wanted to find out which one was the best out of those, and how they compared to other compression and archiving formats.
After looking around on Google for different compression and archiving formats, here is what I’ll be testing: 7-zip, ace, bzip2, gzip, lpaq8, rar, uha and zip. The bzip2, gzip, and lpaq8 compression formats only work on a single file, and therefore need to be archived first. To get a full comparison, I ran all of them on the tar archive used as well.
Settings Used for each Compression Format
| Compression | 7zip | ace | bzip2 | gzip | lpaq8 | rar | uha | zip |
|---|---|---|---|---|---|---|---|---|
| Program | 7-Zip | WinAce | Peazip | Peazip | Peazip | Winrar | WinUHA | Winrar |
| Version | 4.65 | 2.69 | 2.1 | 2.1 | 2.1 | 3.80 | 2.0 RC1 | 2.1 |
| In Toolkit? | yes | no | yes | yes | yes | no | no | no |
| Compression Level | Ultra | Maximum | Ultra | Ultra | 9 | Best | - | Best |
| Passes | - | - | 7 | 10 | - | - | - | - |
| Compression Method | LZMA | - | - | - | - | - | ALZ-3 | - |
| Dictionary Size | 64MB | 1024K | 900KB | 32KB | - | 4096KB | 4096KB | - |
| Word Size | 64 | - | - | 128 | - | - | - | - |
| Other Options | - | Ace 2.0 compression enabled | - | - | - | Disabled 64-bit executable (Itanium) Compression | - | - |
File Details
The files being compressed have a total size of: 1,026,692,529 bytes. The toolkit is comprised of close to 200 programs, some command line and some win32 executables, as well as the DLL’s and text documentation that accompany each program. Many of the portable applications are already UPX’ed to save space, which makes it harder to re-compress them. The .tar archive created using peazip portable 2.1 is a total size of 1,041,943,040 bytes (101.49% of the files actual size).
Testing Methods
The testing methodology was pretty simple. Run the compression programs above on the tar archive as well as the files and folders directly when possible. Grab the file size (using windows file explorer) and calculate the compression ratio. Since I’m concerned only with the overall result, I used the total size of the files & folders to calculate the compression ratio each time. Because the tar archive is slightly larger than the files themselves, this ratio is not indicative of the actual compression ratio of the format, but final compression ratio of files on disk + archiving + compression.
I’m not overly concerned with the time it takes as most dual core machines should handle this quite easily, so I ran them in the background doing 3 or 4 at a time.
Note: I ran into problems using UHA compression on the files and folders directly. It gave errors with a specific file including with the Icon Sushi program in the toolkit. Because of that, there are no results for the UHA compression of the files, just the tar
| Compression Format | Size (in bytes) | Compression ratio (%) |
|---|---|---|
| .tar.lpaq8 | 425,447,693 | 41.44% |
| .7z | 428,686,406 | 41.75% |
| .tar.7z | 443,436,613 | 43.19% |
| .tar.uha | 468,876,301 | 45.67% |
| .tar.ace | 478,156,078 | 46.57% |
| .ace | 479,530,867 | 46.71% |
| .tar.rar | 479,790,907 | 46.73% |
| .rar | 511,837,957 | 49.85% |
| .tar.bz2 | 528,823,641 | 51.51% |
| .tar.gz | 542,612,424 | 52.85% |
| .tar.zip | 554,995,557 | 54.06% |
| .zip | 563,946,617 | 54.93% |
Conclusions
I was surprised to see the ACE and UHA formats ahead of RAR when considering the popularity of the formats, but the ability to automatically span RAR’s and the availability of tools that support it probably account for the increased popularity compared to the slightly more efficient formats.
Just as surprising to me were the placings of Bzip2 and GZip vs the RAR and regular ZIP formats. I had assumed due to the popularity of .tar.gz and .tar.bz2 that they would be as efficient as RAR’s and significantly more efficient than the old zip format. This is not the case, but their open licensing is most likely responsible for their increase in popularity over the proprietary RAR and ZIP formats, especially with open source projects.
While the best compression format tested is lpaq8, the difference between that and 7-zip is negligible. The time and system requirements required for lpaq8 (1.6gb ram used and an hour of time) as well as the relative obscurity of the format means that we will be sticking with 7-zip as the format of choice for our toolkit.
