Discover new compression innovations Brotli and Zstandard

Brotli and Zstandard are two recent lossless compression algorithms. Discover more about them and how The Guardian is using them in production.

Mariot Chauvin

Published on Thursday, 1 December 2016

Advent developer blog 2016   Mathematics  

Mathematician Claude E. Shannon, inventor of information theory (Photo by Alfred Eisenstaedt/Time & Life Pictures/Getty Images)
Mathematician Claude E. Shannon, inventor of information theory (Photo by Alfred Eisenstaedt/Time & Life Pictures/Getty Images) Photograph: Alfred Eisenstaedt/Time Life Pictures/Getty Images

In 1948, Claude Shannon published an extraordinary article, defining for the first time a mathematical model of information and determining the maximum information quantity that can be transferred over a channel, now called the shannon limit, and the limits to possible lossless data compression.

Since, engineers have been trying to approach such limits dealing with two other practical factors the speed to compress and the speed to uncompress data.

This article will present two quite recent algorithms and how you can already benefit by using them.

Zstandard

Zstandard is both a new compression algorithm and a reference implementation which has been designed to be extremely performant with modern hardware. It is a general-purpose compression for a variety of data types.

While usually algorithm trades-off either compression ratio, compression speed, or decompression speed, Zstandard is designed to be good at all 3!

Compared to zlib (wrapper and de facto standard implementation of the deflate algorithm), which tries to balance compression ratio and speed:

Zstandard achieve this performance thanks to several design decisions:

At the Guardian we are now using ZStandard instead of zlib (using the java JNI binding) for compressing articles in our most critical component, the publication pipeline!

Brotli

Brotli is a general purpose lossless compression algorithm, that has been recently been standardised as an http compression encoding. Brotli has been developed by google, and has the following characteristics:

Brotli trades-off compression speed for decompression speed and a slightly improved compression ratio.

Compared to gzip (thin wrapper around zlib, if you are confused this is expected), it decompresses about 20% faster, at the same compression ratio.

Although brotli uses a less efficient entropy encoder than Zstandard, it is already implemented and available in Google Chrome, Mozilla Firefox, Opera and (support is in development in Microsoft Edge)

Support of brotli by web browsers on 28-11-2016
Support of brotli by web browsers on 28-11-2016 Illustration: caniuse.com

Support has as well started to be added in web client and servers:

At the Guardian we are using the play framework which provide a built-in gzip filter but not yet a brotli one, so I decided to write it.

Google’s brotli repository doesn’t yet provide a reference java implementation, however you can use jbrotli, a JNI binding.

CDNs have recently improved their support as well:

At the Guardian we have been successfully using the playframework brotli filter on an internal tool and plan to apply it soon to our main frontend.

Continue reading

The Guardian has moved to HTTPS🔒 Guardian Developer Blog Advent