Progressive image decoding is an excellent way in which to accelerate page loads and hence improve the web-browsing experience. This post explains why and elaborates on the recent developments for that approach.
Some people say that since internet speeds are continually trending faster, we don’t really need to enhance image compression. They believe that JPEG is good enough and that, in particular, progressive decoding belongs to the past, important for web surfing in the early 1990s with slow dial-in modems, which are no longer in use in the modern world.
I think those people are wrong. Yes, the internet is faster. However, not everyone has high-speed internet. Those who do—at home or at work—can’t access it at all times, not while they’re traveling. Separately, the faster internet has led to heavier websites, with the web becoming way more visual with ever more and larger images and videos. Images represent a large amount of data: every pixel consists of at least three numbers (R, G, and B), each number requiring at least 8 bits. So, without compression, 1 megapixel equals 3 megabytes. Given that the median webpage contains 2.1 MP worth of images, sending them, uncompressed, on a 3-Mbps, 3G connection would take at least 17 seconds—a long wait!
Plus, we desire high-resolution images and ones that require a wide color gamut and a high dynamic range, not achievable with 8-bit encoding. Bottom line: image compression remains a must-do.
In essence, lossless image compression is simple. The hard part is to find a more concise representation but, in the end, it stands for exactly the original pixel values that went in. For typical photographs, lossless compression accords you a compression ratio of 2:1 only, or maybe 3:1, which translates to 1 megapixel in 1 megabyte instead of 3. Not bad, but not good enough.
Remarkably, lossy compression can easily deliver ratios of 20:1 with no visible artifacts. In the ideal scenario, those artifacts are only numerical differences between the original and the decoded pixel values. Visually, unless you zoom in a lot, the images look the same, yet lossy compression brings a 1-megapixel image down to a much more manageable size of 150 KB.
Remember, you compress online images to improve the browsing experience. Data caps aside, file sizes matter because they determine how long users must wait to see your images. The smaller the files, the faster the images appear and the more pleasing the user experience.
Hence the promise of progressive decoding, which enables browsers to display image content before the files have finished loading.
What’s progressive decoding? Clever image codecs organize compressed bits in such a way that even a partially—say, 10-percent—loaded image, can be decoded, resulting in a lower-quality (or lower-resolution) preview. The 30-year-old JPEG codec can do that, but that feature, optional and underused, is enabled by default in fancy JPEG encoders only, like mozjpeg.
Progressive images are interlaced, meaning they load immediately on a website at first with a low resolution. They then increase their resolution as the website loads completely. Although blurry, visitors can already see the entire image at first sight.
Progressive decoding can improve the browsing experience by another order of magnitude: not only can it reduce a 3-MB, uncompressed image to 150 KB, it displays the image after downloading a mere 15 KB. To see the fine details, you must wait until the transfer is complete. However, if you’re just scrolling through the webpage, chances are that you’ll get an idea of the image from the preview. For the median webpage, lossy compression shortens the 17-second image-loading time to only one second, and progressive decoding can cause loading to proceed unnoticeably fast.
For video codecs, progressive decoding of a single frame is a waste of time. That’s because videos contain many frames, displayed in rapid succession, and you must buffer enough of the compressed video data before it makes sense to start playback.
Nonetheless, many new image codecs are derived from video codecs: WebP is basically a single-frame VP8 WebM video; HEIC is a single-frame HEVC video; and AVIF is a single-frame AV1 video. Because of their video origins, however, they don’t support progressive decoding. Too bad—even though those formats can reach higher compression densities, you must wait until all or most of the image data has loaded before you can see anything.
As a result, for all that AVIF’s superior compression capability could, for example, turn a 150-KB JPEG into a 75-KB AVIF, the first preview might paradoxically take four times longer to display. In other words, when 20 KB of the progressive JPEG image has loaded, a reasonably promising preview becomes available. For the AVIF, you must wait for the arrival and decoding of all 75 KB. Besides, the more complicated AVIF format takes longer to decode than the JPEG format.
To make a progressive image, you can initially load a small, blurry image, followed by a small black and white image, and then transition to the full color image. This method ensures that users get a preview of the image even before the full image is loaded, enhancing the browsing experience.
The spectrum is wide, ranging from mere placeholders (really, really low-quality previews, e.g., a simple gradient of two predominant colors or a very blurry version of the image based on a dozen pixels only) to low-quality previews that can clue users in on the images, such as “quality 30” images as previews for the actual “quality 80” ones. In the case of AVIF and JPEG XL, you can embed LQIPs, saving the step of replacing the image externally.
The downside of separate previews or placeholders is that the total transfer size inevitably goes up. The enhanced browsing experience delivered by the preview deteriorates because it takes longer for the final image to arrive, and all the bytes necessitated by the preview or placeholder, which is separate and redundant, are, ultimately, wasted. The smaller the LQIPs, the lower their overhead—but also the less useful as a preview..
In contrast, progressive decoding does not waste bytes on separate previews: the first bytes of the actual high-quality image are the preview image. Talk about a welcome feature!
The state of the art of progressive images, which are as old as JPEG, has remained largely the same for 20 or 30 years. Excitingly, that’s starting to change.
First, the green martians I blogged about before—which can happen if the first luma and chroma information is not simultaneously available—are no longer an issue because browsers now wait until both chroma channels are available before showing a preview.
Another recent improvement is in the upsampling techniques that show the first preview of a progressive JPEG, which is an image at 1:8 resolution. Basically, one pixel is available as the average color for every 8×8 block, also called the direct current (DC) coefficient. The simplest possible upsampling would yield a very blocky preview, for which you just fill all the 8×8 blocks with the DC value, as here:
Those techniques are for progressive JPEGs. More enhancements are forthcoming for JPEG XL. An example is that you can progressively encode the DC itself in JPEG XL to more speedily generate the first preview. Normally, it takes 10 to 15 percent of the total file size to get the DC, which is the first full-image preview for a progressive JPEG. With progressive DC, a feature of JPEG XL, you can create a first LQIP when only one percent of the total file size has arrived.
JPEG XL offers two more options for advanced progressive encoding:
- Middle-out scans: In JPEG, scans are always top to bottom. In JPEG XL, for which encoding occurs in groups of 256×256 pixels, you can reorder the groups. So, you can start each and every scan with the groups in the middle, which presumably contain the most enticing part of the image.
- Saliency progression: Progressive scans of JPEGs must provide the same amount of new detail for every part of the image. Not so in the case of JPEG XL. That means you can progressively encode images based on saliency, such as by sending the faces or foreground objects in an image in more detail first, and the background later.
Largest Contentful Paint (LCP) is a new user-experience metric Google will adopt to determine the ranking of search results. Even though discussion is still ongoing, a consensus has been reached to consider progressive rendering as an LCP factor.
In general, enhanced progressive rendering leads to perceived faster web performance and improved user experience. LCP will better capture those refinements, leading to higher Google-search rankings and stronger SEO.
Unlike WebP, HEIC, and AVIF, JPEG and JPEG XL were designed for progressive decoding. The progressive capabilities of JPEG XL are superior to JPEG’s, however. Recall that reasonably appealing LQIPs become available with only a one-percent transfer of image data—and no need for separate and redundant LQIPs or preview images.
In summary, JPEG XL is a boon for the browsing experience, reducing bandwidth and displaying images faster and with higher fidelity. I’ll keep you posted on the format’s development.
My next article will discuss what it takes to create a codec to replace JPEG and why previous attempts failed. Stay tuned.