Images are a critical part of any website or app. And the more engaging the visual experience is to your visitors, the better.
Images are also one of the largest contributors to the weight of the page, potentially slowing down the experience for your customers or worse, contributing to the carbon emissions coming from the page.
The good news is tools like Cloudinary help with easy-to-setup optimization for images and videos, but it’s important to be aware of the issues in the first place.
Which is why we built imagecarbon.com.
After you test your site, head over to Product Hunt and leave us a review!
Inspired by tools like those from The Green Web Foundation, Website Carbon Calculator, and WebPageTest, we wanted to create a tool focused on determining how images specifically are impacting your website.
We want to focus on the impact in terms of performance and to narrow in on how those images are impacting the environment, including how much carbon they’re producing every time the page is requested.
We see images as one of the biggest potential areas of improvement for the content that’s being served on a website or app.
Videos can certainly get up there in size, but images are more common and are something just about everybody uses. Not to mention a lot of the time people who are embedding videos, do so with third-party tools like YouTube.
Looking at the data, 99.88% of images are not being sent in the optimal format.
Imagine a scenario where overnight, everyone in the world starts optimizing their images. We’re looking at a tremendous amount of carbon being saved!
Five main components that make Image Carbon work:
- Scraping all of the images from a website
- Uploading and creating optimized versions for comparison
- Collecting the meta details about all of the images including the estimated carbon emissions
- Storing this all in cache to avoid expensive requests
- Displaying this all in a friendly UI
Let’s walk through each of these points.
To make any of this work, we need to be able to find all of the images on a page. There are a variety of options for how we can tackle this.
My first attempt was to use Playwright which is a fantastic automation tool for testing, but can be repurposed for tasks like visiting a website headlessly and collecting information.
The issues we ran into were mostly around time and the reliability of the results in a JavaScript-rendered world. We didn’t want to deal with the complexity of a server, so opting for serverless, a lot of services have a timeout of 10s, and even with that in mind, that’s a long time to have someone wait in addition to the other work going on.
On top of that, it can be challenging to try to determine when a page has fully been rendered, JavaScript included. There are a lot of variables at play, especially considering a diverse set of websites to inspect. We wanted something that would work well for most use cases.
ScrapingBee can extract images out-of-the-box including making sure JavaScript is rendered with a simple checkbox.
extract_rules: JSON.stringify({ images: { selector: 'img', type: 'list', output: { src: 'img@src', loading: 'img@loading', } }}),wait_browser: 'domcontentloaded'
Code language: JavaScript (javascript)
Setting this up was easy in their Request Builder without any prerequisite knowledge.
This made the image collection easy with a simple API request to ScrapingBee, who would ensure we were getting the information we needed.
Once we had the images, we needed to create a manageable way to both inspect the metadata and deliver those images both as the originals and the optimized versions.
Easy decision—Cloudinary handles this quite well.
After the images were scraped, we uploaded them to Cloudinary to make sure we could reliably deliver them from our website (especially considering we don’t want to spam the original website’s images URLs).
Upon upload, a benefit is we get an easy collection of metadata including the dimensions and the filesize which is critical in our emissions calculations.
We also get optimized versions with a simple transformation f_avif. Which we can then fetch to get the filesize.
AVIF was used to create a reliable baseline between both the original images and the optimized versions. While f_auto is usually the recommended way to go, to ensure browser and device compatibility, depending on what device that person is visiting, might give a different format, skewing our results.
With both versions of our images and all of the metadata, we’re ready to pull out the calculators.
Calculating carbon emissions is a very complex topic. This is made pretty clear by any honest organization that releases a model along with the emphasis on the results being estimated.
We looked to the Green Web Foundation and their use of the Sustainable Web Design model through CO2.js to get the best results.
Again, this is made pretty simple from an implementation perspective given the CO2.js SDK, where once installed, we can use the perVisit method, which attempts to calculate caching on repeat visits.
We pass in the byte data from both the original image and the optimized image which returns an estimated amount of carbon being emitted from that amount of data being served.
const co2Original = emissions.perVisit( upload.bytes, true|false // Is it a green host?);
const co2Optimized = emissions.perVisit( optimizedSize, true|false // Is it a green host?);
Code language: JavaScript (javascript)
With that, we’re given all of the information we need to start displaying the results!
But first… all of those requests are long and expensive from a user experience perspective.
Running a bunch of serverless or Edge functions to scrape a website then calculate the emissions takes a lot of time.
That time is put on the visitors who are eager to see the results. That time is wasted energy (and carbon). And let’s be honest, that time is an additional compute that costs money.
To avoid making those expensive requests every time, we can use a database to cache those results.
We’re big fans of Xata, not only for its ability to store data, but its awesome search and other APIs that sit on top. While we’re not using the Search API right now for Image Carbon, it opens up the doors for future possibilities for correlating results across all sites.
Anyways, this is made easy with Xata.
Two tables were set up: Sites and Images.
Sites include a list of all the sites along with a collected date, so we can set an expiration for refreshing the cache, and the URL.
The images which are more important to the site are stored in the Images table, one row for each image, including the metadata for both the original and optimized image, and the site URL to correlate it back to the Sites table.
It’s a pretty simple setup that allows us to easily do the following:
- Check if results exist
- If so and not expired, return results
- If not or expired, let the UI know to scrape again
- Once scraped, add the new results in a separate request
It’s allowed us to reliably store our information, quickly retrieve it, and refresh it as necessary.
Finally, we needed a way to orchestrate all of this and display it in an application.
Next.js and Vercel are as reliable as they get and include some features out-of-the-box that make it compelling to use.
Starting with the application, rendering the UI could be handled in a variety of frameworks, but client-side page routing and server-side help give us a way to provide a great experience to our visitors.
Upon entering the form on the homepage with a website, we immediately push the visitor to a scanning page.
This page is a static page that performs the operations asynchronously on the client. This allowed us to almost instantly push the visitor to this page for immediate feedback, where we can start messaging the progress of the initial scan.
Once the scan is complete, we cache the data on Xata, and once the cache is complete (given this is a quick operation), we then push them to the actual results page where we load that cache on the server and display the page results.
This approach gives us complete control over the loading experience, a landing page specific to the website scanned (helps with SEO and open graph social images), and using the cache allows that server rendering to happen quickly.
As far as the scraping and asynchronous requests are concerned, they are all happy in a mix of serverless and edge functions on Vercel, which are a reliable and flexible way to manage our server-logic.
While many factors play into carbon emissions around the world, optimizing images (and videos) is an easy way to have a little control over our individual carbon footprints.
As important as this is, the benefits don’t stop there, where quicker loading pages will help deliver a better experience to your visitors.
Optimization should be an important step in any web (or mobile) development workflow, and we can help! Head over to our image and video optimization pages in our docs to learn more or get started with a free account.
To learn more about the Image Carbon project, you can check out the source code over on Github: https://github.com/colbyfayock/imagecarbon