The Cloudinary Image Dataset ’22 (CID22) is a large image quality assessment (IQA) dataset created in 2022, consisting of 22k annotated images based on 250 pristine images, compressed using (Moz)JPEG, WebP, AVIF, JPEG XL, JPEG 2000, and HEIC.

Quality range

Compared to other IQA databases like KADID-10k or TID2013, CID22 is relatively focused: distortions include only image compression, and the quality range is from medium quality to (near) visually lossless, e.g. mozjpeg q30 to q95. Previous datasets typically tended to focus on much lower qualities:

Histogram of SSIMULACRA 2 scores across various IQA datasets

This is the range relevant for web delivery of images with various trade-offs between fidelity and bandwidth. It is also the quality range the new JPEG AIC-3 standard will focus on. CID22 is part of Cloudinary's response to the AIC-3 Call for Contributions on Subjective IQA.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Downloads

The full CID22 dataset consists of two parts:

The annotations for the validation set (mean bias-corrected opinion scores) are freely available to the research community. The full set of images is also available.

Paper

The CID22 dataset is presented in this paper, including a detailed description and discussion of the test methodology that was used. An extended version of this paper was submitted as a contribution to the JPEG AIC-3 Call for Contributions on Subjective Image Quality Assessment.

If you use the CID22 dataset in your research, you can cite it as follows:

@article{CID22,
title={{CID22}: Large-Scale Subjective Quality Assessment for High Fidelity Image Compression},
author={Sneyers, Jon and Ben Baruch, Elad and Vaxman, Yaron},
journal={IEEE MultiMedia},
pubstate={Submitted},
year={2023},
doi={10.36227/techrxiv.22659061}}

Codec comparison

The following plot shows bitrate/distortion curves aggregated over the entire CID22 dataset:

Per-image plots are available for every image in the validation set; there are also aggregated plots available per image category, based on the full CID22 dataset: codec performance plots.

Objective metrics

Using the CID22 data to evaluate objective metrics, we get the following Kendall and Spearman rank-order correlation coefficients (KRCC and SRCC) and Pearson correlation coefficients (PCC). The sign only indicates whether the metric is of the “smaller is better” type (the number indicates amount of difference) or of the “bigger is better” type (the number indicates quality). Higher absolute values are better.

Metric KRCC SRCC PCC
(SSIMULACRA 2) 0.6934 0.882 0.8601
Butteraugli 2-norm -0.6575 -0.8455 -0.8089
Butteraugli 3-norm -0.6547 -0.8387 -0.7903
DSSIM 3.2 -0.6428 -0.8399 -0.7813
VMAF 0.6176 0.8163 0.7799
FSIM 0.6089 0.8005 0.7676
PSNR-HVS 0.6076 0.8100 0.7559
Butteraugli max-norm -0.5843 -0.7738 -0.7074
SSIM 0.5628 0.7577 0.7005
MS-SSIM 0.5596 0.7551 0.7035
LPIPS -0.5417 -0.7316 -0.6932
SSIMULACRA 1 -0.5255 -0.7175 -0.6940
PSNR-Y 0.4452 0.6246 0.5901
PSNR (ImageMagick compare -metric psnr) 0.3472 0.5002 0.4817
CIEDE2000 0.3154 0.4584 0.4096

Butteraugli, SSIMULACRA 1 and 2 are also part of libjxl. For SSIM, MS-SSIM, PSNR-Y, PSNR-HVS and CIEDE2000, the libvmaf implementation was used.