CID22 - Cloudinary Image Dataset '22

The Cloudinary Image Dataset ’22 (CID22) is a large image quality assessment (IQA) dataset created in 2022, consisting of 22k annotated images based on 250 pristine images, compressed using (Moz)JPEG, WebP, AVIF, JPEG XL, JPEG 2000, and HEIC.

Quality range

Compared to other IQA databases like KADID-10k or TID2013, CID22 is relatively focused: distortions include only image compression, and the quality range is from medium quality to (near) visually lossless, e.g. mozjpeg q30 to q95. Previous datasets typically tended to focus on much lower qualities:

Histogram of SSIMULACRA 2 scores across various IQA datasets

This is the range relevant for web delivery of images with various trade-offs between fidelity and bandwidth. It is also the quality range the new JPEG AIC-3 standard will focus on. CID22 is part of Cloudinary's response to the AIC-3 Call for Contributions on Subjective IQA.

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Downloads

The full CID22 dataset consists of two parts:

a training set based on 201 pristine images, and
a validation set based on 49 pristine images.

The annotations for the validation set (mean bias-corrected opinion scores) are freely available to the research community. The full set of images is also available.

CID22 dataset (7.2 GB)
250 reference images, 21903 distorted images
CID22 validation set (1.4 GB)
49 reference images, 4292 distorted images
CID22 validation set without distorted images (17 MB)
49 reference images
CID22 validation set MCOS scores (103 KB)
just the CSV file (also included in any of the above)

Paper

The CID22 dataset is presented in this paper, including a detailed description and discussion of the test methodology that was used. An extended version of this paper was submitted as a contribution to the JPEG AIC-3 Call for Contributions on Subjective Image Quality Assessment.

If you use the CID22 dataset in your research, you can cite it as follows:


@article{CID22,

title={{CID22}: Large-Scale Subjective Quality Assessment for High Fidelity Image Compression},

author={Sneyers, Jon and Ben Baruch, Elad and Vaxman, Yaron},

journal={IEEE MultiMedia},

pubstate={Submitted},

year={2023},

doi={10.36227/techrxiv.22659061}}

Codec comparison

The following plot shows bitrate/distortion curves aggregated over the entire CID22 dataset:

Per-image plots are available for every image in the validation set; there are also aggregated plots available per image category, based on the full CID22 dataset: codec performance plots.

Objective metrics

Using the CID22 data to evaluate objective metrics, we get the following Kendall and Spearman rank-order correlation coefficients (KRCC and SRCC) and Pearson correlation coefficients (PCC). The sign only indicates whether the metric is of the “smaller is better” type (the number indicates amount of difference) or of the “bigger is better” type (the number indicates quality). Higher absolute values are better.

Metric	KRCC	SRCC	PCC
(SSIMULACRA 2)	0.6934	0.882	0.8601
Butteraugli 2-norm	-0.6575	-0.8455	-0.8089
Butteraugli 3-norm	-0.6547	-0.8387	-0.7903
DSSIM 3.2	-0.6428	-0.8399	-0.7813
VMAF	0.6176	0.8163	0.7799
FSIM	0.6089	0.8005	0.7676
PSNR-HVS	0.6076	0.8100	0.7559
Butteraugli max-norm	-0.5843	-0.7738	-0.7074
SSIM	0.5628	0.7577	0.7005
MS-SSIM	0.5596	0.7551	0.7035
LPIPS	-0.5417	-0.7316	-0.6932
SSIMULACRA 1	-0.5255	-0.7175	-0.6940
PSNR-Y	0.4452	0.6246	0.5901
PSNR (ImageMagick `compare -metric psnr`)	0.3472	0.5002	0.4817
CIEDE2000	0.3154	0.4584	0.4096

Butteraugli, SSIMULACRA 1 and 2 are also part of libjxl. For SSIM, MS-SSIM, PSNR-Y, PSNR-HVS and CIEDE2000, the libvmaf implementation was used.

CID22 Cloudinary Image Dataset ’22