Compression Formats Compared


In preparation to release version 0.5 (which has grown to a whopping 979mb’s), I decided to find the best compression format for our toolkit. I normally use 7-zip and Winrar, but wanted to find out which one was the best out of those, and how they compared to other compression and archiving formats.

After looking around on Google for different compression and archiving formats, here is what I’ll be testing: 7-zip, ace, bzip2, gzip, lpaq8, rar, uha and zip. The bzip2, gzip, and lpaq8 compression formats only work on a single file, and therefore need to be archived first. To get a full comparison, I ran all of them on the tar archive used as well.

Settings Used for each Compression Format

Compression 7zip ace bzip2 gzip lpaq8 rar uha zip
Program 7-Zip WinAce Peazip Peazip Peazip Winrar WinUHA Winrar
Version 4.65 2.69 2.1 2.1 2.1 3.80 2.0 RC1 2.1
In Toolkit? yes no yes yes yes no no no
Compression Level Ultra Maximum Ultra Ultra 9 Best - Best
Passes - - 7 10 - - - -
Compression Method LZMA - - - - - ALZ-3 -
Dictionary Size 64MB 1024K 900KB 32KB - 4096KB 4096KB -
Word Size 64 - - 128 - - - -
Other Options - Ace 2.0 compression enabled - - - Disabled 64-bit executable (Itanium) Compression - -

File Details

The files being compressed have a total size of: 1,026,692,529 bytes. The toolkit is comprised of close to 200 programs, some command line and some win32 executables, as well as the DLL’s and text documentation that accompany each program. Many of the portable applications are already UPX’ed to save space, which makes it harder to re-compress them. The .tar archive created using peazip portable 2.1 is a total size of 1,041,943,040 bytes (101.49% of the files actual size).

Testing Methods

The testing methodology was pretty simple. Run the compression programs above on the tar archive as well as the files and folders directly when possible. Grab the file size (using windows file explorer) and calculate the compression ratio. Since I’m concerned only with the overall result, I used the total size of the files & folders to calculate the compression ratio each time. Because the tar archive is slightly larger than the files themselves, this ratio is not indicative of the actual compression ratio of the format, but final compression ratio of files on disk + archiving + compression.

I’m not overly concerned with the time it takes as most dual core machines should handle this quite easily, so I ran them in the background doing 3 or 4 at a time.

Note: I ran into problems using UHA compression on the files and folders directly. It gave errors with a specific file including with the Icon Sushi program in the toolkit. Because of that, there are no results for the UHA compression of the files, just the tar

Compression Format Size (in bytes) Compression ratio (%)
.tar.lpaq8 425,447,693 41.44%
.7z 428,686,406 41.75%
.tar.7z 443,436,613 43.19%
.tar.uha 468,876,301 45.67%
.tar.ace 478,156,078 46.57%
.ace 479,530,867 46.71%
.tar.rar 479,790,907 46.73%
.rar 511,837,957 49.85%
.tar.bz2 528,823,641 51.51%
.tar.gz 542,612,424 52.85%
.tar.zip 554,995,557 54.06%
.zip 563,946,617 54.93%

Conclusions

I was surprised to see the ACE and UHA formats ahead of RAR when considering the popularity of the formats, but the ability to automatically span RAR’s and the availability of tools that support it probably account for the increased popularity compared to the slightly more efficient formats.

Just as surprising to me were the placings of Bzip2 and GZip vs the RAR and regular ZIP formats. I had assumed due to the popularity of .tar.gz and .tar.bz2 that they would be as efficient as RAR’s and significantly more efficient than the old zip format. This is not the case, but their open licensing is most likely responsible for their increase in popularity over the proprietary RAR and ZIP formats, especially with open source projects.

While the best compression format tested is lpaq8, the difference between that and 7-zip is negligible. The time and system requirements required for lpaq8 (1.6gb ram used and an hour of time) as well as the relative obscurity of the format means that we will be sticking with 7-zip as the format of choice for our toolkit.

  1. No comments yet.
(will not be published)
  1. No trackbacks yet.