Finding patterns that can reduce data without sacrificing information is the key to effective compression. It shows that an algorithm or model is good at spotting patterns when it can correctly guess the next data piece in a series. This links the idea of making good guesses—which is what large language models like GPT-4 do very well—to achieving good compression.

In a research paper entitled “Compression is a part of language modelingResearchers detail their discovery of the DeepMind Large Language Model (LLM), called Chinchilla 70B You can perform Lossless compression Images from the ImageNet The image database is now 43.4 per cent of its original size. PNG Chinchilla algorithm compressed audio samples from the original data by 58.5 percent. Chinchilla audio compresses samples from the LibriSpeech The audio data are only 16.4 per cent of their size. FLAC Compression at 30.3 percent

The lower the numbers, the more compression has taken place. Lossless compression is when no data are lost during the process of compression. It is in stark contrast to JPEG’s lossy compression, which eliminates some data to make the file smaller.

The study’s findings suggest that although Chinchilla 70B is trained to compress text, the algorithm can also be used for other types of data. It’s often more effective than algorithms created specifically for such tasks. The study’s results suggest that machine learning models can be used to compress data in a variety of ways, not just for text prediction or writing.

A chart of compression test results provided by DeepMind researchers in their paper. The chart illustrates the efficiency of various data compression techniques on different data sets, all initially 1GB in size. It employs a lower-is-better ratio, comparing the compressed size to the original size.
Enlarge / DeepMind researchers provided a chart with the compression test results in their paper. The chart shows the effectiveness of different data compression techniques for data sets that are all 1GB initially. It uses a “lower is better” ratio to compare the compressed size and the original.


In the past 20 years, some computer scientists proposed that the ability compress data efficiently is similar to A form of general intelligence. It is based on the idea that understanding the universe often requires identifying patterns and making meaning of complexity. Good data compression is a good example. A compression algorithm, say proponents, shows that it can understand or represent data by reducing large sets of data to a smaller and more manageable format while still retaining their essential features.

It is important to note that the word “you” means “you”. Hutter Prize It is a good example of how compression can be used to create intelligence. The DeepMind paper was written by Marcus Hutter, an AI researcher and author. The prize The award is given to the person who compresses a set of English texts most efficiently. The underlying assumption is that an efficient compression of text requires understanding the syntactic, semantic and grammatical patterns in language.

So theoretically, if a machine can compress this data extremely well, it might indicate a form of general intelligence—or at least a step in that direction. While not everyone agrees that the Hutter Prize is a sign of general intelligence, this competition shows the overlap between data compression challenges and the goal to create more intelligent systems.

The DeepMind researchers assert that prediction and compression are not mutually exclusive. They claim that if you use a good algorithm for compression, such as GzipYou can use the data it learned from compression to create new original data.

In one section of the paper (Section 3.4), the researchers carried out an experiment to generate new data across different formats—text, image, and audio—by getting gzip and Chinchilla to predict what comes next in a sequence of data after conditioning on a sample. Understandably, gzip didn’t do very well, producing completely nonsensical output—to a human mind, at least. The results show that gzip, while it can be forced to produce data, this data is not very useful. Chinchilla was better at the generative test because it is designed to process language.

An example from the DeepMind paper comparing the generative properties of gzip and Chinchilla on a sample text. gzip's output is unreadable.
Click to Enlarge / This is an example of a DeepMind paper that compares the generative capabilities of Chinchilla with gzip on a text sample. The output of gzip is unreadable.


Although the DeepMind article on AI language models compression hasn’t been peer reviewed, it gives an interesting glimpse into possible new applications for language models. There is ongoing research and debate on the relationship between intelligence and compression, so you’ll see many more papers in the near future.