A year ago, I talked about JPEG XL at ImageCon 2019. It’s time for an update.
JPEG XL is a next-generation image codec currently being standardized by the JPEG Committee. Based on Google’s PIK codec and Cloudinary’s Free Universal Image Format (FUIF) codec, JPEG XL has created a sum that’s greater than the parts by leveraging the best elements of Google PIK and FUIF:
- From PIK, JPEG XL has reaped the focus on strong psychovisual modeling and preservation of detail and texture, as well as decode speed, in particular enabling parallelization and efficient cropped decoding of huge (gigapixel or more) images.
- From FUIF, JPEG XL has brought in the cornerstone on being responsive by design and universal.
- From PIK and FUIF, JPEG XL has learned to put a tremendous emphasis on being legacy friendly, delivering a smooth transition from existing file formats (notably JPEG, but also PNG, GIF, and TIFF) to JPEG XL.
The key features of JPEG XL are described in a recent white paper published by the JPEG Committee.
This section highlights the important features that distinguish JPEG XL from other state-of-the-art image codecs like HEIC and AVIF.
Although they can never make guarantees (patent trolls can always suddenly wake up), the contributors who created JPEG XL have agreed to license the reference implementation under the Apache 2.0 license. That means that, besides being Free and Open Source Software (FOSS), JPEG XL also comes with a royalty-free patent grant.
That’s not at all the case for the High Efficiency Image File Format (HEIC), which is based on the HEIF container, on which Nokia claims patents; and the High Efficiency Video Coding (HEVC) codec, which is a complete patent mess. For the AV1 Image File Format (AVIF), the patent situation looks better since it’s based on AV1, and being royalty free was a major goal of the Alliance for Open Media, which created AV1. It is not clear, though, to what extent AV1 is actually royalty free. Moreover, AVIF is based on the HEIF container so Nokia patents might apply as well.
You can transcode existing JPEG files effectively and reversibly to JPEG XL without any additional loss. Not so for previous attempts at creating “next-generation” image formats, such as JPEG 2000, JPEG XR, WebP, and now HEIC and AVIF. Transcoding to one of those other formats requires decoding the JPEG image to pixels and then re-encoding those pixels with the other format—an irreversible process that leads to generation loss.
To me, legacy friendliness is an important feature that facilitates a smooth transition from JPEG to a successor format without requiring a transition period in which two versions of every image—the old JPEG file and the new “successor-format” file—must be stored to satisfy the long tail of users who haven’t upgraded yet. Such a requirement completely defeats the purpose of improved image compression.
Especially for web delivery, it would be desirable to avoid having to store and serve multiple variants of the same image according to the viewer’s viewport width. Equally desirable is an option to progressively decode images, showing a low-quality image placeholder when only a few hundred bytes have arrived and adding more detail as the rest of the data shows up. JPEG XL ably supports both nice-to-haves.
As a rule, image formats that are based on video codecs do not support those two worthwhile features because the concept doesn’t make much sense for a single video frame. WebP (based on VP8), HEIC (based on HEVC), and AVIF (based on AV1) only offer sequential decoding, i.e., the image loads at full detail from top to bottom, and you must wait until it has almost been completely transferred before getting an inkling of the image content.
JPEG XL is particularly effective with image compression at perceptual qualities that range from visually nearly lossless (in a side-by-side comparison), over visually completely lossless (in a flicker test, which is stricter than a side-by-side evaluation), to mathematically lossless. A lot of effort has been mounted to preserve subtle texture and other fine image details. Even though you can take advantage of the codec at lower bitrates, where the degradation becomes obvious, it really shines at the relatively higher bitrates.
In contrast, image formats based on video codecs tend to excel at the very low bitrates: they can produce a nice image in just a few bytes. The image looks good at first but, on closer inspection, often seems weirdly “plasticy,” e.g., skin complexion becomes very smooth as if the compression applied a huge amount of foundation cream, or “distilled” like an oil painting. That’s acceptable for video codecs: you need a low bitrate to keep the file size or bandwidth reasonably low and, since frames are typically only shown for less than 40 milliseconds, the audience usually doesn’t notice such artifacts. For still images, however, a higher quality is often desired.
The JPEG XL reference encoder (cjpegxl) produces, by default, a well-compressed image that is indistinguishable from (or, in some cases, identical to) the original. In contrast, other image formats typically have an encoder with which you can select a quality setting, where quality is not really defined perceptually. Consequently, one image might look fine as a quality-60 JPEG while another might still contain annoying artifacts as a quality-90 JPEG.
Original PNG image (2.6 MB)
JPEG XL (default settings, 53 KB): indistinguishable from the original
WebP (53 KB): some mild but noticeable color banding along with blurry text
JPEG (53 KB): strong color banding, halos around the text, small text hard to read
Zooming in a bit, you can see how JPEG XL preserves the text even better than a five-times- larger quality-95 JPEG, which still emits some subtle discrete cosine transform (DCT) noise around the letters. At a similar compression rate, HEIC, WebP, and JPEG look significantly worse than JPEG XL for this image.
|JPEG XL (53 KB)|
|JPEG q95 (253 KB)|
|HEIC (55 KB)|
|WebP (53 KB)|
|JPEG (53 KB)|
Internally, JPEG XL leverages a novel, perceptually-motivated color space called XYB. Most other codecs still use the YCbCr color space, usually with chroma subsampling. YCbCr, which is rooted in analog color television, is a relatively crude and somewhat dated attempt at modeling human color perception. Part of YCbCr’s problem is lack of precision, especially in the dark colors and in the blues and reds. That’s why dark video scenes are often a terrible blocky mess.
Thanks to its more accurate color handling, JPEG XL is better at avoiding color-banding issues—even in those difficult darks.
|Original PNG (1.3 MB)|
(brightened for clarity)
(4 KB, brightened for clarity)
(4 KB, brightened for clarity)
(4 KB, brightened for clarity)
(5 KB, brightened for clarity)
JPEG XL handles numerous image types, including regular photographs; illustrations; cartoons; computer-generated images; logos; user-interface elements; screenshots; maps; medical imagery; images for printing, e.g., Cyan Magenta Yellow Black (CMYK) with additional spot colors; scientific images; satellite images; game graphics; huge images (gigapixel or even terapixel); tiny icons; images with alpha transparency, selection masks or depth information; layered images; and so on.
Apropos of workflows, you can leverage JPEG XL not only as a web-delivery format, but also as a local storage and an exchange format for authoring workflows, for which fast and effective lossless compression and high bit depth are important. In terms of functionality and compression, JPEG XL fully supersedes JPEG, PNG, GIF, WebP, and TIFF.
In contrast, video codec-based formats tend to have limitations that do not matter for video but that might impact still images in terms of dimensions, bit depth, number of channels, and types of image content.
|Format||Maximum Image Dimensions
(in a Single Code Stream)
|Maximum Bit Depth,
Maximum Number of Channels
|JPEG||4,294 megapixels (65,535 x 65,535)||8-bit, three channels (or four for CMYK)||PNG||Theoretically 4 exapixels
(but no way to efficiently decode crops)
|16-bit, four channels (RGBA)||WebP||268 megapixels
(16,383 x 16,383)
|8-bit, four channels (RGBA)||HEIC||35 megapixels 1
(8,192 x 4,320)
|16-bit, three channels
(alpha or depth as separate image)
|AVIF||9 megapixels 1
(3,840 x 2,160)
|12-bit, three channels
(alpha or depth as separate image)
(1,073,741,823 x 1,073,741,824)
|24-bit (integer) or 32-bit (float),
up to 4,100 channels
- HEIC and AVIF can handle larger images but not directly in a single code stream. You must decompose the image into a grid of independently encoded tiles, which could cause discontinuities at the grid boundaries. Illustration: grid boundary discontinuities in a HEIC-compressed image. HEIC and AVIF can handle larger images but not directly in a single code stream. You must decompose the image into a grid of independently encoded tiles, which could cause discontinuities at the grid boundaries. Illustration: grid boundary discontinuities in a HEIC-compressed image.
You can encode or decode modern video codecs like AV1 and HEVC in software, but the computational cost is high, especially for well-optimized encoding. Dedicated hardware is desirable or even required to efficiently implement such codecs. In contrast, you can easily encode or decode JPEG XL in software on current hardware. The speed results in the table below are based on four CPU cores.
Codec Encoding Speed (MP/s) Decoding Speed (MP/s) JPEG (libjpeg-turbo) 49 108 HEVC (HM) 0.014 5.3 HEVC (x265) 3.7 14 JPEG XL 50 132
You might conclude from the above that AVIF and HEIC are pointless. That’s not true; they have three important strengths.
Both AVIF and HEIC can reach very low bitrates yet still produce presentable images. For all that, obviously, a lot of the image information has vanished, the compression artifacts are much less bothersome than those of JPEG.
For applications for which bandwidth, storage reduction, or both of those factors are the main concern, i.e., they are more important than image fidelity, AVIF and HEIC might come in handy. On the other hand, if bandwidth is the major issue, then you might also desire progressive or responsive decoding, which AVIF and HEIC do not support.
Even though you can create animation in JPEG XL, it offers no advanced video-codec features, such as motion estimation. JPEG XL compresses better than GIF, APNG, and animated WebP but cannot compete with actual video codecs for production of “natural” video. Even for a three-second looping video or cinemagraph, where most of the image is static, actual video codecs like AV1 and HEVC can compress much better than still-image codecs.
HEIC already works well far and wide in the Apple ecosystem. No matter that HEIC doesn’t yet function in the Safari browser as a web image format, it does already support HEVC as a video codec.
AV1 shines as a video codec in the Google Chrome and Firefox ecosystems, and AVIF could follow suit. With the influential Alliance for Open Media as its sponsor, AV1 counts among its many proponents giant enterprises. Furthermore, hardware devices for AV1 are already available.
HEIC and AVIF are now on tap. JPEG XL is still in the final stages of standardization, however, and does not yet work in browsers.
The JPEG Committee is a working group of the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC). The standardization process takes time, involving multiple stages of balloting, in which the draft specification is scrutinized by various national-standards bodies, including the American National Standards Institute (ANSI) in the U.S., Deutsches Institut für Normung e.V. (DIN) in Germany, the Japanese Industrial Standards Committee (JISC) in Japan, etc.
The main stages of the standardization process are New Project (NP), Working Draft (WD), Committee Draft (CD), Draft International Standard (DIS), Final DIS (FDIS), and International Standard (IS).
The JPEG XL standard will consist of four parts:
- Part 1 (the main part), which describes the codestream (the actual image codec), is currently in the DIS stage.
- Part 2, which describes the file format (the container that wraps the codestream and additional metadata or extensions), has just proceeded to the CD stage.
- Part 3, which describes the procedure for testing conformance of JPEG XL decoders, is in the WD stage.
- Part 4, which is the reference implementation, is also in the WD stage.
If everything goes as planned, an International Standard for part 1 will be available at the beginning of 2021; for the other parts, at the end of 2021.
In practice, once the process reaches the FDIS stage, the spec is “frozen” and you can use JPEG XL for real. Nonetheless, it’ll still take time and effort to garner software support for JPEG XL in applications and on platforms. Dethroning the old JPEG will not be a trivial task, as evidenced by several failed attempts in the past. We are hopeful that we’ll succeed this time around and that we’ve created a worthy successor to a 30-year-old image format that’s as old as the World Wide Web, that’s older than Google, that’s twice as old as Facebook and Twitter, and that’s three times as old as WhatsApp, Instagram, and Cloudinary.
Here’s hoping that, once it becomes a standard, JPEG XL will last for 30 years, too!
For a deep dive into the topics on the current visual web, see the Cloudinary 2020 State of the Visual Media Report, or download the full report below.