Cloudinary Blog

Improve the Web Experience With Progressive Image Decoding

Progressive Image Decoding Delivers an Enhanced Web Experience

Progressive image decoding is an excellent way in which to accelerate page loads and hence improve the web-browsing experience. This post explains why and elaborates on the recent developments for that approach.

The Importance of Image Compression

Some people say that since internet speeds are continually trending faster, we don’t really need to enhance image compression. They believe that JPEG is good enough and that, in particular, progressive decoding belongs to the past, important for web surfing in the early 1990s with slow dial-in modems, which are no longer in use in the modern world.

I think those people are wrong. Yes, the internet is faster. However, not everyone has high-speed internet. Those who do—at home or at work—can't access it at all times, not while they're traveling. Separately, the faster internet has led to heavier websites, with the web becoming way more visual with ever more and larger images and videos. Images represent a large amount of data: every pixel consists of at least three numbers (R, G, and B), each number requiring at least 8 bits. So, without compression, 1 megapixel equals 3 megabytes. Given that the median webpage contains 2.1 MP worth of images, sending them, uncompressed, on a 3-Mbps, 3G connection would take at least 17 seconds—a long wait!

Plus, we desire high-resolution images and ones that require a wide color gamut and a high dynamic range, not achievable with 8-bit encoding. Bottom line: image compression remains a must-do.

In essence, lossless image compression is simple. The hard part is to find a more concise representation but, in the end, it stands for exactly the original pixel values that went in. For typical photographs, lossless compression accords you a compression ratio of 2:1 only, or maybe 3:1, which translates to 1 megapixel in 1 megabyte instead of 3. Not bad, but not good enough.

Remarkably, lossy compression can easily deliver ratios of 20:1 with no visible artifacts. In the ideal scenario, those artifacts are only numerical differences between the original and the decoded pixel values. Visually, unless you zoom in a lot, the images look the same, yet lossy compression brings a 1-megapixel image down to a much more manageable size of 150 KB.

Remember, you compress online images to improve the browsing experience. Data caps aside, file sizes matter because they determine how long users must wait to see your images. The smaller the files, the faster the images appear and the more pleasing the user experience.

Hence the promise of progressive decoding, which enables browsers to display image content before the files have finished loading.

Progressive Decoding

What’s progressive decoding? Clever image codecs organize compressed bits in such a way that even a partially—say, 10-percent—loaded image, can be decoded, resulting in a lower-quality (or lower-resolution) preview. The 30-year-old JPEG codec can do that, but that feature, optional and underused, is enabled by default in fancy JPEG encoders only, like mozjpeg.

Progressive decoding can improve the browsing experience by another order of magnitude: not only can it reduce a 3-MB, uncompressed image to 150 KB, it displays the image after downloading a mere 15 KB. To see the fine details, you must wait until the transfer is complete. However, if you’re just scrolling through the webpage, chances are that you’ll get an idea of the image from the preview. For the median webpage, lossy compression shortens the 17-second image-loading time to only one second, and progressive decoding can cause loading to proceed unnoticeably fast.

Image Versus Video

For video codecs, progressive decoding of a single frame is a waste of time. That’s because videos contain many frames, displayed in rapid succession, and you must buffer enough of the compressed video data before it makes sense to start playback.

Nonetheless, many new image codecs are derived from video codecs: WebP is basically a single-frame VP8 WebM video; HEIC is a single-frame HEVC video; and AVIF is a single-frame AV1 video. Because of their video origins, however, they don’t support progressive decoding. Too bad—even though those formats can reach higher compression densities, you must wait until all or most of the image data has loaded before you can see anything.

As a result, for all that AVIF’s superior compression capability could, for example, turn a 150-KB JPEG into a 75-KB AVIF, the first preview might paradoxically take four times longer to display. In other words, when 20 KB of the progressive JPEG image has loaded, a reasonably promising preview becomes available. For the AVIF, you must wait for the arrival and decoding of all 75 KB. Besides, the more complicated AVIF format takes longer to decode than the JPEG format.

Previews and Placeholders

To use nonprogressive codecs like WebP and AVIF but still generate a somewhat progressive browsing experience, leverage Low Quality Image Placeholders (LQIPs). In that case, you first serve a low-quality version of your images and then replace them with the actual ones with, for example, JavaScript.

The spectrum is wide, ranging from mere placeholders (really, really low-quality previews, e.g., a simple gradient of two predominant colors or a very blurry version of the image based on a dozen pixels only) to low-quality previews that can clue users in on the images, such as “quality 30” images as previews for the actual “quality 80” ones. In the case of AVIF and JPEG XL, you can embed LQIPs, saving the step of replacing the image externally.

The downside of separate previews or placeholders is that the total transfer size inevitably goes up. The enhanced browsing experience delivered by the preview deteriorates because it takes longer for the final image to arrive, and all the bytes necessitated by the preview or placeholder, which is separate and redundant, are, ultimately, wasted. The smaller the LQIPs, the lower their overhead—but also the less useful as a preview..

In contrast, progressive decoding does not waste bytes on separate previews: the first bytes of the actual high-quality image are the preview image. Talk about a welcome feature!

Improved Progressiveness

The state of the art of progressive images, which are as old as JPEG, has remained largely the same for 20 or 30 years. Excitingly, that’s starting to change.

First, the green martians I blogged about before—which can happen if the first luma and chroma information is not simultaneously available—are no longer an issue because browsers now wait until both chroma channels are available before showing a preview.

First program scan

Another recent improvement is in the upsampling techniques that show the first preview of a progressive JPEG, which is an image at 1:8 resolution. Basically, one pixel is available as the average color for every 8x8 block, also called the direct current (DC) coefficient. The simplest possible upsampling would yield a very blocky preview, for which you just fill all the 8x8 blocks with the DC value, as here:

Upsampling technique

Now reaching browsers is an improved upsampling technique, which creates a less artifacted, more appealing preview:

Improved upsampling technique

Those techniques are for progressive JPEGs. More enhancements are forthcoming for JPEG XL. An example is that you can progressively encode the DC itself in JPEG XL to more speedily generate the first preview. Normally, it takes 10 to 15 percent of the total file size to get the DC, which is the first full-image preview for a progressive JPEG. With progressive DC, a feature of JPEG XL, you can create a first LQIP when only one percent of the total file size has arrived.

JPEG XL offers two more options for advanced progressive encoding:

  • Middle-out scans: In JPEG, scans are always top to bottom. In JPEG XL, for which encoding occurs in groups of 256x256 pixels, you can reorder the groups. So, you can start each and every scan with the groups in the middle, which presumably contain the most enticing part of the image.
  • Saliency progression: Progressive scans of JPEGs must provide the same amount of new detail for every part of the image. Not so in the case of JPEG XL. That means you can progressively encode images based on saliency, such as by sending the faces or foreground objects in an image in more detail first, and the background later.

Largest Contentful Paint

Largest Contentful Paint (LCP) is a new user-experience metric Google will adopt to determine the ranking of search results. Even though discussion is still ongoing, a consensus has been reached to consider progressive rendering as an LCP factor.

In general, enhanced progressive rendering leads to perceived faster web performance and improved user experience. LCP will better capture those refinements, leading to higher Google-search rankings and stronger SEO.

The Expediency of JPEG XL

Unlike WebP, HEIC, and AVIF, JPEG and JPEG XL were designed for progressive decoding. The progressive capabilities of JPEG XL are superior to JPEG’s, however. Recall that reasonably appealing LQIPs become available with only a one-percent transfer of image data—and no need for separate and redundant LQIPs or preview images.

In summary, JPEG XL is a boon for the browsing experience, reducing bandwidth and displaying images faster and with higher fidelity. I’ll keep you posted on the format’s development.

My next article will discuss what it takes to create a codec to replace JPEG and why previous attempts failed. Stay tuned.

Recent Blog Posts

Partner news: Cloudinary-Getty Images Integration

Supported by intelligent automation, Cloudinary serves as an effective conduit between media asset management and delivery so you can take maximum advantage of assets, compress workflows, and build and coordinate engaging and inspiring customer experiences. Through Cloudinary’s Digital Asset Management (DAM) solution, which employs the company’s innovative image and video APIs, creative and marketing teams can benefit from them, as well as from many AI-powered and automated capabilities. As a result, you can transform, optimize, and deliver media at scale on an intuitive UI.

Read more
Why Audio in Video Matters

Many content creators and consumers tend to regard video as visuals, but that’s only part of the experience. Immersive video content includes strong audio. Just like in a movie, the audio for video content comprises many components: the narrator or subjects, the background music that sets the mood and draws viewers in, sound effects, and so forth.

Read more

For Developers: the HTML <picture> Element Explained

By Amarachi Amaechi
For Developers: the HTML <picture> Element Explained

We all know the good ol', tireless <img> element, which has been a long-time go-to for inserting graphics into webpages. Time doesn’t stop, however, and neither do technological advancements. So, let’s get you up to speed with the element’s modern alternative: the <picture> element.

Read more