2023 State of
Visual Media Report

Welcome to our fifth annual State of Visual Media report, an in-depth look into how the world's most engaging brands are harnessing the power of visual content to connect with their customers.

Since our inaugural report, we've been through a global health crisis, unprecedented weather events, and seismic shifts in demographics and social attitudes — all of which significantly impact consumer and market behavior. The way brands in our study manage and deliver image and video content offers a snapshot into how they're succeeding despite these myriad challenges.

Whether it's a leading retail brand managing millions of image and video assets with ease, a real estate platform's savvy use of next-gen technology to deliver immersive 3D virtual tours, or the travel industry's ability to pivot from a period of bust to boom — one consistent thread remains.

Images and video are no longer simply playing a leading role in the story. They are the story.

Section 01

Beyond the Hype: Unlocking the Power of Generative AI

It's been a year since the launch of ChatGPT, the AI-powered language model developed by OpenAI. Leading to one the most seismic shifts tech has seen in recent memory, ChatGPT is the viral hit that surpassed one million users ‐ from developers to high school students ‐ in just five days. Cloudinary has been employing the power of AI, machine and deep learning for more than a decade, extending and expanding what's possible in image and video creation and delivery. Our decade-plus experience in AI has taught us that hype in tech is one thing, practical application is another.

When integrating new technologies into a daily routine, humans tend to start by using those tools and apps that are easy to use and are going to make us more efficient, read: make our work easier. In the realm of images and videos, advanced AI technologies have been applied first to streamline or eliminate tedious, repetitive, but essential tasks like automatically resizing images for different viewing windows. According to our Web Developer Survey, 54% of developers said that AI allows them to automate previously manual tasks. This frees brands to focus their limited resources on the more creative, personal, and ultimately 'human' tasks that take the brand experience from 'good to great.' Where skilled labor is at a premium, this is particularly important. AI automation helps brands manage and deliver images and videos efficiently while boosting morale by freeing people's time to focus on more rewarding, meaningful work. It's no surprise that developers surveyed also cited improved efficiency and productivity (68%) as a top benefit. In fact, our data show that an average 34 days can be saved with automation today. With generative AI, these numbers will likely go up even more.

Report using AI every day
Say AI's main benefit is enabling productivity and efficiency
Believe AI has the potential to improve the developer experience

AI can also help brands save resources and enable work that was not previously possible. Historically, the advanced creative work needed to build and maintain dazzling online experiences has, for many smaller brands, been out of reach. New generative AI capabilities are making what was once impossible, possible and more accessible for everyone. Generative AI is empowering users to create, edit, and deliver dynamic visual experiences at scale. For example, instead of reshooting an entire campaign, developers and digital marketers can remove unwanted objects and change backgrounds to create beautiful new versions of existing images. Similarly, AI-powered image captioning instantly creates intelligent summaries of images to improve accessibility, asset searchability, and SEO ranking, while increasing productivity and reducing production time. With the latest developments in ChatGPT, it is even possible for users to perform image transformations and optimizations via simple commands like, "please blur this image and crop it to a square."

Having offered solutions with advanced AI for more than a decade, Cloudinary has not only observed, but actively contributed to the development of the practical applications of leading-edge AI capabilities. It is exciting to see how the mainstream adoption of ChatGPT is encouraging people to experiment more with image and video creation and delivery.

When we first launched our ChatGPT feature, we saw tremendous interest from both new and existing customers. The same is true for other new generative AI capabilities and the willingness to try them is real. According to our Web Developer Survey, 64% use generative AI tools to streamline the development process and 99% believe they have the potential to improve the developer experience overall.

While some people fear that AI tools might replace developer jobs, I actually think the opposite. AI will expand our vision of what we are capable of building and therefore continue to drive innovation.

— James Quick, Developer and Content Creator
Getting started with new technology can be a little overwhelming. Here are 3 tips to help you get started:
Improve SEO and accessibility

AI-powered image captioning enables the automatic generation of descriptions for images. This can be used to improve SEO rankings and web accessibility by enabling visually impaired users to understand the content of images through screen readers.

Edit images with ease

Eliminating unwanted and adding new image elements can be a tedious, time-consuming manual task. Generative AI does it faster, increasing your team's productivity and ensuring that images stay true to brand guidelines and look great.

Create videos faster

Every campaign benefits from video. But with trends moving so fast, it's not always possible to create compelling videos on time. Advanced AI can be the campaign saver, helping brands automatically generate videos by reusing existing assets, colors, and text.

Case Study

Neiman Marcus Reduces Photoshoot-to-Web by 50%

Because images are automatically optimized using AI, page load times have been faster. It was certainly 3X faster, compared to a few years ago

— Sri Kalavacharla, Senior Director, Omni Personalization and Engagement Engineering
Section 02

Industry Spotlight: Travel Makes a Comeback

Few industries know a comeback story better than travel and hospitality. Until March 2020, the median percentage of video bandwidth in the travel and hospitality vertical was quite high, peaking at 23%. In 2020 and into the Spring of 2021, at the height of the pandemic, video bandwidth dropped to between 1% and 5%. Now, just two years later, travel is back and the industry has made a remarkable recovery.

However, the industry remains highly volatile: extreme weather events, security issues, and other global events are having an impact on bookings and sales. As analyzed by EHL Insights: "The main challenges for the hospitality industry are the lack of predictability and the magnitude of such events ‐ and how fast the industry can react and adapt."

Peak median percentage of video bandwidth in Travel & Hospitality industry
The choice of video formats is critical in delivering immersive experiences
Increase in use of video in the industry, particularly H.265 codec

It's not just external factors putting pressure on the industry. New online platforms and players like Airbnb have entered the scene, creating additional pressures when it comes to capturing eyes and mindshare. According to EHL, "One area where traditional hoteliers are lagging behind their more advanced counterparts is online visibility and brand recognition."

The good news is that it doesn't need to take a massive team of developers and creatives to offer captivating digital experiences. Delivering an irresistible, immersive experience is achievable with tools that help developers and creatives work more efficiently. As Gen Z emerges as a new customer segment, the pressure to deliver highly immersive web experiences will only increase and the ability to deliver these should not fall only to those brands with the biggest teams and budgets.

Delivering immersive experiences and offering high quality videos without sacrificing performance is possible with newer lightweight video codecs like H.265. Ahead of many other industries, travel has embraced the H.265 codec and ranks fifth in VP9 adoption, a codec critical for compression.

Technology has made it easier for travel and hospitality brands to deliver immersive digital experiences that offer a world of creative possibilities to engage customers across all demographics. Some potential next-generation visual experiences for the industry include:
Personalized Video Content

Advances in AI and machine learning have made it easier for travel brands to create personalized video content that adapts to individual travelers' preferences and interests helping travelers discover experiences that are tailored to their tastes (e.g., beach package vs. ski package)

Interactive Virtual Tours

Virtual tours have become more interactive, allowing travelers to explore destinations in greater detail and interact with digital elements, such as menus or hotel rooms for an engaging and personalized experience than traditional video tours. The industry can learn from Real Estate's great success in adopting VP9 to support interactive virtual tours.

Augmented Reality (AR)

AR technology allows travelers to overlay digital information onto real-world environments, providing a more contextual and immersive experience. For example, AR maps can display information about landmarks, restaurants, and local events as travelers navigate a destination.

Case Study

Xanterra Travel Collection Finds Easy Path to Visual Experience Management

A travel experience company offering curated vacation packages for US national parks, cruises, and exclusive properties, Xanterra relies heavily on visual content to engage consumers.

The company prides itself in growing its sustainable travel practices and expanding its business through acquisitions. As more properties are added to its portfolio and more digital content is ingested, Xanterra needed a better way to quickly integrate visual assets inherited through its acquisitions. It also needed a more efficient way to store and manage images and videos for their many websites.

Assets ingested into Cloudinary

We needed a solution that would be flexible enough for our various properties now and into the future. To us, this meant the solution had to be lightweight and easy to use so no matter how much more content we must manage, we'll be ready to serve our stakeholders.

— Andrew Heltzel, Corporate Director of Marketing and CRM
Section 03

Modern Commerce and the Path to Better Customer Engagement

It's official. Most of our shopping (57%) ‐ from research to purchase ‐ is now happening online. This was revealed in an independent survey Cloudinary commissioned earlier this year of 2,693 consumers across Australia, Germany, the UK, and the US. The survey also found that while every age group relies on reviews to make purchasing decisions, the younger the consumer, the more likely they are to shop online and rely on visual and social media testimonials to seal the deal. Zooming in closer, the survey shows that Gen Z expects interactive experiences such as user-generated videos that show products in action (52%) or 3D images that provide a realistic view of the product (16%).

Delivering user-generated content (UGC), from reviews to videos or interactive images and videos for each campaign quickly, reliably, and at scale, gives brands a significant competitive advantage yet many still struggle to make it all work. Often such challenges are because their legacy e-commerce stack wasn't designed to meet today's demands, or they run into performance issues. The latter can have a big and immediate negative impact. According to an e-commerce study by PwC, 32% of respondents said they would stop doing business with a brand they loved after just one bad experience.

The need to deliver the fast and flawless experiences today's consumers expect is one reason more e-commerce brands are moving to more flexible, composable infrastructures. These modern architectures make it easy for brands to add and replace new technologies on the fly, without huge investments and complex decision-making processes.

Increasingly the model many Cloudinary customers are adopting ‐ in full or in part ‐ is a composable one, also known as MACH (microservices-based, API-first, cloud-native SaaS, and headless). In 2021, Cloudinary joined the MACH Alliance, a group of independent technology companies that support a composable architecture where every component is pluggable, scalable, replaceable, and can be continuously improved to meet evolving business needs.

More than any other age group, a lack of images showing the product in use puts Gen Z shoppers off (41%)
Gen Z (73%) primarily shop online using their smartphones, and they are the most likely group to use a brand's app to shop (26%)
85% of organizations increased the amount of MACH in their infrastructure in the past 12 months

According to MACH Alliance research, companies that have adopted a MACH architecture cite an increased ability to respond more quickly to market changes, develop and deploy new functionality faster, and reduce costs. They're also more likely to say their infrastructure is keeping up with customer demands and that they're ahead of the competition than those with lower MACH adoption rates.

“By adopting headless technology, the Paul Smith creative team has a new lynchpin holding together everything from photography to marketing. The team has also witnessed the transformation of a laborious task ‐ moving clothing images around ‐ to a seamless image handling process that uses naming match to automatically assign each image an SKU.”
‐ Hannah Bennett, Head of Digital at Paul Smith

With the oldest members of Gen Z now in their twenties and their spending power and influence skyrocketing, e-commerce brands would be wise to future-proof their IT architecture for the growing customer expectations for highly visual, engaging digital experiences.

The continued movement towards composable commerce offers the most value for brands. 'Unique customer experiences' means more visual content, more tie-ins with social media platforms like TikTok, and more 'pushing the envelope' to gain competitive advantages.

— Christopher C. Holley, Global Director, ISV Partnerships, commercetools Inc.

Auto-generate realistic images of products by enabling users to engage with a 3D image or spin set which simulates a 360-degree viewing experience.

Case Study

River Island: Modern Image Workflows Improve Shopper Experience and Reduce Costs

Quite often you talk to a potential partner about MACH but find they're not truly living MACH. From our first conversations with [Cloudinary], we really felt like from the ground up, the architecture and Cloudinary as a company was living those MACH principles.

— David Edwards, Head of Technology
Section 04

Going Green: How to Deliver Images and Video Sustainably

Sustainability is no longer a 'nice to have' but a top priority for most organizations. More than nine out of ten engineering decision makers (91%) consider sustainability to be moderately or very important when defining their tech stack and future infrastructure, according to MACH Alliance research.

Sustainability, however, is a complex puzzle. A corner piece is cutting greenhouse gas emissions,a goal that nearly every country and many of the world's leading brands have committed to. One place brands can start to assess the impact of their carbon footprints is their websites. Making the work of measuring a website's carbon footprint easier, Cloudinary's Director of DevX Engineering Colby Fayock has built the Image Carbon site. The app enables anyone to calculate a web page's (or pages') carbon emissions.

When a website's carbon footprint is too high, the likely culprits are bandwidth-hungry images and videos. The good news is that this problem is easy to fix. Cloudinary's AI-powered tools automatically determine the optimal file size and quality of an image or video, then convert it to a newer, lighter format or codec that best matches the user's device. As an example, a top international sports apparel brand was able to reduce bandwidth consumption by 40%, from 6.8 TB/day to 4.05 TB/day. On an annualized basis, the company saved 618 TB of bandwidth, which is equivalent to saving 1,890 tons of CO2. And while those savings are higher than the average, if we look at just one month of our data, each customer saved an average of 5.8 tons of CO2, equivalent to roughly 5 long-haul flights per passenger or approximately 35-60 flight hours.

Optimize your visual content and reduce your carbon footprint. The customers in our data sample saved an average 5.8 tons of CO2, roughly the same as 5 long-haul flights per passenger.

That's one piece of the puzzle solved. Here's another piece: the right images can help reduce carbon footprints by minimizing product returns. According to the NRF, consumers are expected to return $816 billion worth of retail goods in 2022. Not only are returns a big drain on retailers' bottom lines, they're devastating to the climate. In Germany alone, an estimated 795,000 tons of CO2 could be attributed to returns in 2021, according to research from the University of Bamberg.

Cloudinary's international survey found that 30% of people returned goods because the products did not look as expected on the website. Fortunately, the survey also highlights one approach to fixing it. Effective use of user-generated image and video content, along with innovations such as 3D, 360, and augmented reality applications, can help lessen returns.

To avoid swapping one problem for another, brands should keep an eye on image and video bandwidth to ensure that they don't end up increasing CO2 emissions while taking other steps to reduce them.
Where to begin

Cloudinary's Image Carbon site can be a good starting point to check that your visuals are as carbon-friendly as possible.

How it works

If you want to understand how the Image Carbon calculator works, check out the blog post, Are your website images impacting the environment? Image Carbon can let you know

Take a closer look

To learn more about the Image Carbon project, you can check out the source code over on Github: https://github.com/colbyfayock/imagecarbon

Section 05

Modern formats for a modern world

While changes relating to the use of image codecs have shifted more gradually compared to the emergence of Generative AI, they're certainly worth noting just the same. From JPEG to WebP, AVIF to JPEG XL, image formats, and the weight of them, are critically important to a brand's visual storytelling. Cloudinary data reveals that while WebP remains the number one image format, with adoption rates around 60% for the last two years, it has been steadily declining since May 2023, dropping to 49.5% in August 2023.

What might be causing this decline? AVIF. The AVIF format has steadily grown in popularity. With an adoption rate of less than one percent (0.4% in October 2021), it has grown steadily over the last two years, reaching 13.3% adoption in August 2023 and now ranks as the third most popular image format.

But it's not just that the use of WebP has declined. Usage of the mother of all image standards, JPEG, has also dropped during the last two years from 27.4% in August 2021 to 19.9% in August 2023. However, it remains the second most used image format.

Adoption of the lightweight image format HEIC and the 'dinosaur' PNG fluctuated between 6-8% over the last two years. Both reached 8.7% in August 2023, placing them in a tie for fourth place.

For video, the well-established MPEG-4 is still the most supported codec. However, the second and third places have changed. Transport Stream is now in second place, followed by WebM.

Trends in the format adoption

Steady between 6-8%

This year's data reinforces the need for brands to optimize for mobile as mobile has emerged as the preferred way to browse the web in the majority of countries. In addition to countries, we also took a closer look at mobile vs. desktop across industries. The top three mobile-first industries were retail and consumer packaged goods (75.3%), food and beverage (73.1%), and financial services (63.5%). However, only three industries had mobile traffic below 50 percent, with manufacturing & automotive at the bottom of the list at 38.4 percent. When the most desktop-friendly industry has more than a third of its traffic coming from mobile, the bottom line is that brands across all industries have no choice but to optimize for mobile.

Want to learn more about JPEG XL?

Developers and other enthusiasts can contact JPEG XL co-creator and Cloudinary senior image researcher Dr. Jon Sneyers jon@cloudinary.com, read his most recent blogs, or participate in the conversation by joining the JPEG XL Discord.

High-quality images drive online engagement. Low-quality, pixelated images don't. Unfortunately, high-quality images can slow load times and lower your brand's Google Core Web Vitals score. Compression is key. But how much can you compress an image before its quality is diminished?
Because image compression is part of what we do at Cloudinary, we understand when to use which codec. To prove this, we created the Cloudinary Image Dataset (CID22), a large dataset of human-annotated compressed images. Here is what we found:

WebP provides a significant gain over unoptimized JPEG: a 25 to 35% gain at the low end of the quality spectrum, although the gain diminishes at the high end.

AVIF offers significant additional savings over WebP of about 10 to 15%. However, these savings come at the expense of speed: AVIF encoding at the default speed is an order of magnitude slower than WebP.

JPEG XL offers an additional savings of about 5 to 10% over AVIF. The additional savings are greater in the higher quality range than in the lower quality range.

More Information

More details about JPEG XL can be found here. The CID22 data set is available here.

Case Study

World Class Football Club Bayern Munich Wins the Image Management Game with Cloudinary

[Getting] the very latest codecs and image optimization techniques is so cost-efficient and a great ongoing benefit for the club, as we need to send images and deliver them in a very optimized way to our users on our websites and apps.

— Michael Oellerer, FC Bayern Munich

Download the full report