A Simplified Explanation of ZIP File Compression

Quick Overview

ZIP files shrink data by removing repeated or extra info
Compression works best on files with patterns or repetition
ZIP uses lossless compression—original file stays intact
Random files don’t shrink much, structured ones shrink a lot
Example: A file of just "1"s shrank from 5MB to under 1KB

ZIP files are a popular way to make files smaller so they’re easier to save and share. This article explains how ZIP files work, the basics of compression, and why some files shrink more than others.

{tocify} $title={Table of Contents}

What Is Compression?

Compression is about making files smaller by getting rid of extra or repeated information. If a file has the same thing repeated over and over, compression can replace it with a shorter version.

Think of a song where the chorus repeats many times. Instead of writing out the whole chorus every time, you could just say, “Repeat the chorus four times”. This saves space without losing the meaning.

Example of Simple Compression

Here’s an example of how compression works.

Take the sentence:

"The boy and the dog took the ball to the park."

Now, imagine we replace "the" with a symbol like "#". The sentence becomes:

"# boy and # dog took # ball to # park."

This way, fewer characters are used, but the sentence still makes sense.

How Do Compression Algorithms Work?

Compression algorithms look for patterns or repeated pieces of data. If a file has a long stretch of the same thing, like "11111111", the algorithm can store a note saying, “Write ‘1’ eight times” instead of saving each “1” separately. This keeps the file smaller without losing any details.

When Compression Works Best

Compression works better on files with lots of patterns or repeated parts. For example, a file full of random numbers won’t shrink much because there’s no pattern to simplify. On the other hand, a file with repeated numbers or text will compress easily.

What Are the Limits of Compression?

Compression doesn’t always make files smaller. If a file has no patterns to simplify, the compressed version might even end up bigger because of the extra data added by the algorithm itself.

Lossless vs. Lossy Compression

ZIP files use lossless compression, which means nothing gets lost in the process. The original file can be fully restored.

Other types of compression, like lossy compression, reduce the file size by removing some details (for example, quality loss after image compression), but that’s not how ZIP files work.

Real-World Examples

In a practical example, one Reddit user created two CSV files: one containing 2²¹ (or 2,097,152) "1"s and another with 2²¹ (or 2,097,152) random digits. Both files were saved as CSVs and had the same size of 5,242,883 bytes.

When compressed using a tool like 7zip, the file with only "1"s reduced to 960 bytes, while the file with random digits compressed to 1,039,422 bytes.

This shows that files with repeating patterns shrink much more than random ones.

Even data that looks random can sometimes be compressed if it has some structure, like alternating digits or symbols. But if every part of the data is truly random, compression won’t help much.

Conclusion

ZIP compression is a smart way to save space by spotting and simplifying patterns in data. It works well for many files, but not all, depending on how much repetition or structure the data has. Knowing how it works can help you understand why some files shrink a lot while others barely change.

Source: Reddit