Cloudinary Blog

One pixel is worth three thousand words

How various image formats compress one-pixel images

A couple of months ago while taking a break from implementing cool new features like q_auto and g_auto, I was joking in our team chat about how well various image formats “compress” one-pixel images. In response, Orly — who runs the blog — asked me if I’d write a post about single-pixel images. I said: "Sure, why not. But it will be a very short blog post. After all, there’s not much you can say about a single pixel."

Looks like I was wrong. Very wrong.

What can you do with one pixel?

Back in the early days of the web, one-pixel images were widely used as a poor man’s solution to do things we now do with CSS. Spacing, creating lines or rectangles, semi-transparent backgrounds: there’s quite a lot you can do by simply scaling one pixel to arbitrary dimensions. Another use of one-pixel images, still a common practice today, is as a web beacon, for tracking or analytics.

In responsive web design, one-pixel images are often used as temporary placeholders while the page is loading. Since most browsers do not support client-hints, some responsive image solutions wait for the page to fully load in order to determine the actual rendered image sizes, and then replace a one-pixel image with the right breakpoint image using JavaScript.

Broken image example

There is one other use of single-pixel images: they can be used as ‘default’ images. If for whatever reason the actual image that you want to show cannot be found, it might in some cases be better to hide that fact (by showing one transparent pixel) than to return a “404 - Not Found” error, which will usually be rendered by browsers as a “broken image” icon. In both cases, you don’t get to see the intended image, but it might look a bit more professional if you don’t ‘rub it in’ by showing a broken image icon.

OK, it looks like one-pixel images do have some uses. So, what’s the best way to encode a 1x1 image?

Obviously, this is a fringe case for image compression formats. If the “image” only consists of a single pixel, there sure is not a lot of data to compress. In fact, the uncompressed data is just one bit to four bytes – depending on how you interpret the data: black & white (1 bit), grayscale (1 byte), grayscale + alpha (2 bytes), RGB (3 bytes), or RGBA (4 bytes).

But you can’t encode just the data. In any image format, you need to specify how to interpret the data. At the very least, you need to know the width and height of the image, and the number of bits or bytes per pixel.


Typically, to encode the width and height, four bytes are used: two bytes per number (if it were only one byte, the maximum image dimension would be 255x255). Let’s say that we need another byte to encode the color type of the image (e.g. grayscale, RGB or RGBA). In this minimalistic image format, a single-pixel image would take at least 6 bytes (e.g. for a white pixel) and at most 9 bytes (for a semi-transparent, arbitrary color pixel).

However, actual image formats tend to have a “header” that contains quite a bit more information. First of all, the first few bytes of any image format contain a fixed identifier that is only there to say “Hey! I’m a file in this particular file format!”. This fixed sequence of bytes is also known as the magic number. For example, a GIF file always starts with either GIF87a or GIF89a (depending on which version of the GIF spec is used), a PNG file always starts with an 8-byte sequence that includes PNG, JPEG files have a header that contains the string JFIF or Exif, and so on.

Headers can contain all sorts of meta-information about an image. Some of it is format-specific information to indicate what kind of subformat is used, and is necessary to decode the pixels correctly. Some of it might not be necessary to decode the pixels, but is still useful to know how to render them – e.g. color profiles, orientation, gamma, or dots-per-pixel. Some of it might be arbitrary metadata, like comments, timestamps, copyright notices, or GPS coordinates. These things might be optional, or they might be obligatory; it depends on the format specification. Of course all of this metadata has some cost in terms of file size. So let’s focus on “minimal” files, where all of the non-obligatory metadata has been stripped. Otherwise we might be wasting precious bytes on silly things.

Besides headers, image formats may have other kinds of “overhead”. They may contain all kinds of markers and checksums, intended to make the format more robust in case of transmission errors or other forms of corruption. Also, sometimes some kind of padding is required, to ensure that the data gets aligned properly.

One-pixel images – the smallest possible images – reveal exactly how much “overhead” there is in an image format. Let’s take a look.

Here is a hexdump of a 67-byte PNG file, representing a 1x1 white pixel:

00000000  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
00000010  00 00 00 01 00 00 00 01  01 00 00 00 00 37 6e f9  |.............7n.|
00000020  24 00 00 00 0a 49 44 41  54 78 01 63 68 00 00 00  |$|
00000030  82 00 81 4c 17 d7 df 00  00 00 00 49 45 4e 44 ae  |...L.......IEND.|
00000040  42 60 82                                          |B`.|

This file consists of the 8-byte PNG magic number, followed by a header chunk (IHDR) which contains 13 bytes, an image data chunk (IDAT) with 10 bytes of “compressed” image data, and an end marker (IEND). Every chunk starts with a 4-byte chunk length and a 4-byte chunk identifier and ends with a 4-byte chunk checksum, and these three chunks are obligatory, so that’s another 36 bytes, for a total file size of 67 bytes.

A black pixel is also 67 bytes in PNG; a fully transparent pixel is 68 bytes, and an arbitrary RGBA color will be between 67 and 70 bytes.

JPEG has a longer header. The smallest one-pixel JPEG is 160 bytes (Update: 141 bytes). And it cannot be transparent, because JPEG does not support an alpha channel.

GIF is the most compact (in terms of headers) amongst the three universally supported image formats. A white pixel can be encoded as a valid GIF file in just 35 bytes:

00000000  47 49 46 38 37 61 01 00  01 00 80 01 00 00 00 00  |GIF87a..........|
00000010  ff ff ff 2c 00 00 00 00  01 00 01 00 00 02 02 4c  |...,...........L|
00000020  01 00 3b                                          |..;|

and a fully transparent pixel can be done in 43 bytes:

00000000  47 49 46 38 39 61 01 00  01 00 80 01 00 00 00 00  |GIF89a..........|
00000010  ff ff ff 21 f9 04 01 0a  00 01 00 2c 00 00 00 00  |...!.......,....|
00000020  01 00 01 00 00 02 02 4c  01 00 3b                 |.......L..;|

Note that for all of the above formats, you can come up with even smaller files that will still decode to a one-pixel image in all or most browsers, but they are not valid with respect to the format specifications, which means that an image decoder might at any time complain (rightfully) that the file is corrupt, and show the broken image icon which we were trying to avoid.

So what’s the best format for a one-pixel image on the web? That depends. If it’s an opaque pixel, then the answer is GIF. If it’s a fully transparent pixel, then the answer is also GIF. But if it’s a semi-transparent pixel, then the answer is PNG, since GIF only supports all-or-nothing transparency.

Not that all of this matters very much. All of these files fit easily in a single network package, so in practice, there is no real speed difference – and the storage needed for this is negligible anyway. But still, it’s an amusing thing to look at, at least for image format geeks like me.

What about other, more exotic file formats?

If you use WebP for one-pixel images, be sure to use lossless WebP. A single-pixel lossless WebP image is between 34 and 38 bytes. A single-pixel lossy WebP image is between 44 and 104 bytes, depending mostly on whether there’s an alpha channel or not. For example, this is a fully transparent pixel as a 34-byte lossless WebP:

00000000  52 49 46 46 1a 00 00 00  57 45 42 50 56 50 38 4c  |RIFF....WEBPVP8L|
00000010  0d 00 00 00 2f 00 00 00  10 07 10 11 11 88 88 fe  |..../...........|
00000020  07 00                                             |..|

and here is the same pixel as a lossy (default) WebP of 82 bytes:

00000000  52 49 46 46 4a 00 00 00  57 45 42 50 56 50 38 58  |RIFFJ...WEBPVP8X|
00000010  0a 00 00 00 10 00 00 00  00 00 00 00 00 00 41 4c  |..............AL|
00000020  50 48 0b 00 00 00 01 07  10 11 11 88 88 fe 07 00  |PH..............|
00000030  00 00 56 50 38 20 18 00  00 00 30 01 00 9d 01 2a  |..VP8 ....0....*|
00000040  01 00 01 00 02 00 34 25  a4 00 03 70 00 fe fb fd  |......4%...p....|
00000050  50 00                                             |P.|

The main difference between the two, is that a lossy WebP with transparency is actually stored internally as two images, thrown together into one container file: one lossy image for the RGB values, and one lossless image for the alpha values.


For Bellard’s BPG format, which also has a lossless and a lossy mode, it’s the other way around. The lossy BPG encoding of a single white pixel is 31 bytes, the smallest we’ve seen so far:

00000000  42 50 47 fb 00 00 01 01  00 03 92 47 40 44 01 c1  |BPG........G@D..|
00000010  71 81 12 00 00 01 26 01  af c0 b6 20 bc b6 fc     |q.....&.... ...|

The lossless BPG for the same white pixel is 59 bytes. However, a fully transparent pixel is 57 or 113 bytes as a lossy or lossless BPG, respectively. Interestingly, for a single white pixel, BPG wins versus WebP (31 byte BPG vs 38 byte WebP), but for a single transparent pixel, WebP wins versus BPG (34 byte WebP vs 57 byte BPG).


And then there’s FLIF. As the main creator of the Free Lossless Image Format, obviously I cannot forget about that one. Here’s a 15 byte FLIF file for one white pixel:

00000000  46 4c 49 46 31 31 00 01  00 01 18 44 c6 19 c3     |FLIF11.....D...|

And here’s a 14 byte file for a black pixel:

00000000  46 4c 49 46 31 31 00 01  00 01 1e 18 b7 ff        |FLIF11........|

The black pixel file is one byte smaller because the number zero happens to compress better than the number 255. The header is pretty simple: the first four bytes are always “FLIF”, the next byte is a human-readable indication of the color and interlacing type. In this case it is “1”, which means we have just one color channel (i.e. it’s a grayscale image). The next byte indicates the color depth: “1” means one byte per channel. And the next four bytes are the image dimensions, in this case 0x0001 by 0x0001. The last four or five bytes are the actual compressed data.

One fully transparent pixel is also 14 bytes in FLIF:

00000000  46 4c 49 46 34 31 00 01  00 01 4f fd 72 80        |FLIF41....O.r.|

In this case, we have 4 color channels (RGBA) instead of just one. You might expect the data section to be longer in this file (after all, there are four times as many color channels), but that’s not the case: since the alpha value happens to be zero (it’s a fully transparent pixel), the RGB values are considered irrelevant so they don’t end up being encoded at all.

For an arbitrary RGBA color, the FLIF file can be up to 20 bytes.

OK, so FLIF is the clear winner in the “one pixel” category of some weird image encoding competition. If only this were an important thing to compete at :)

Actually, no. FLIF isn’t the winner. Remember the minimalistic (and non-existent) image format I mentioned in the beginning? The one that would encode single-pixel images in 6 to 9 bytes? Well that format doesn’t exist, so I suppose it doesn’t count. But there is an image format that does exist, and which gets quite close to that.

It’s called the Portable Bitmap format (PBM), and it’s an uncompressed image format from the 1980s. Here’s how you could encode a single white pixel as a PBM file in just 8 bytes:

00000000  50 31 0a 31 20 31 0a 30                           |P1.1 1.0|

Actually, forget about the hexdump, this is a human-readable file format. You can open it in a text editor if you want (at least this particular subformat):

1 1

The first line (“P1”) indicates that this is a black & white image. Not grayscale; there are only two colors: black (which confusingly gets the number 1) and white (0). The second line indicates the image dimensions. And then it’s just a whitespace-delimited list of numbers, one number per pixel. So in this case just the number 0.

If you need something other than pure white or black, you can use the PGM format to get one pixel in any other shade of gray in just 12 bytes, or the PPM format to get any RGB color in just 14 bytes. This is always smaller than the corresponding FLIF file (or any other compressed format, for that matter).

The traditional PNM family (PBM, PGM and PPM) does not support transparency. There is an extension of PNM though, called Portable Arbitrary Map (PAM), which does support images with transparency. Unfortunately for our current purposes, its syntax is quite a bit more verbose. The smallest valid PAM file that encodes a fully transparent pixel, is the following:


On the last line there are four zero (NULL) bytes. The above file is 67 bytes. You might be tempted to use grayscale+alpha instead of RGBA, because that would save two bytes in the data section. But that results in a 71 byte file, since you have to change the TUPLTYPE from RGB_ALPHA to GRAYSCALE_ALPHA. Oh and by the way, your image software might not like the use of MAXVAL 1, so you might need to change that to MAXVAL 255 (which takes two more bytes).

So all in all, for one-pixel images, when there’s no transparency involved, PNM is the smallest (8 to 14 bytes for PNM vs 14 to 18 bytes for FLIF), but when there is transparency, FLIF is smallest (14 to 20 bytes for FLIF vs 67 to 69 bytes for PAM).

Here is a summary table that gives the (optimal) file sizes for various one-pixel images:































Lossy WebP







Lossless WebP







Lossy BPG







Lossless BPG





















It might seem a bit surprising that an uncompressed image format actually beats most of the compressed formats at this particular task. But it’s not that surprising if you think about it. One-pixel images are in a sense the worst-case scenario for image compression: they’re all headers and overhead, and very little data. And the very little data there is cannot really be compressed because compression depends on predictability, and how are you supposed to predict one single pixel?

In part two of this blog post I will discuss the other extreme. How well do extremely predictable single-color images perform in various formats? Stay tuned….

Update: Check out part two as well: A one-color image is worth two thousand words

Recent Blog Posts

Build a WhatsApp Clone with Automatic Image Optimization

In the previous post, we showed how to upload images to a Cloudinary server. In this part, we will play with some of the features we see on the WhatsApp technology. After you or your users have uploaded image assets to Cloudinary, you can deliver them via dynamic URLs. You can include instructions in your dynamic URLs that tell Cloudinary to manipulate your assets using a set of transformation parameters. All image manipulations and image optimizations are performed automatically in the cloud and your transformed assets are automatically optimized before they are routed through a fast CDN to the end user for an optimal user experience. For example, you can resize and crop, add overlays, blur or pixelate faces, apply a variety of special effects and filters, and apply settings to optimize your images and to deliver them responsively.

Read more
With automatic video subtitles, silence speaks volumes

The last time you scrolled through the feed on your favorite social site, chances are that some videos caught your attention, and chances are, they were playing silently.

On the other hand, what was your reaction the last time you opened a web page and a video unexpectedly began playing with sound? If you are anything like me, the first thing you did was to quickly hunt for the fastest way to pause the video, mute the sound, or close the page entirely, especially if you were in a public place at the time.

Read more
Impressed by WhatsApp Tech? Build WhatsApp Clone with Media Upload

With more than one billion people using WhatsApp, the platform is becoming a go-to for reliable and secure instant messaging. Having so many users means that data transfer processes must be optimized and scalable across all platforms. WhatsApp is touted for its ability to achieve significant media quality preservation when traversing the network from sender to receiver, and this is no easy feat to achieve.

Read more
New Google-powered add-on for auto video categories and tags

Due to significant growth of the web and improvements in network bandwidth, video is now a major source of information and entertainment shared over the internet. As a developer or asset manager, making corporate videos available for viewing, not to mention user-uploaded videos, means you also need a way to categorize them according to their content and make your video library searchable. Most systems end up organizing their video by metadata like the filename, or with user-generated tags (e.g., youtube). This sort of indexing method is subjective, inconsistent, time-consuming, incomplete and superficial.

Read more

iOS Developer Camp: The Dog House

By Shantini Vyas
iOS Developer Camp: The Dog House

Confession: I’m kind of addicted to hackathons. Ever since graduating from Coding Dojo earlier this year, I’ve been on the hunt for new places to expand my skills and meet new people in the tech space. iOS Developer Camp’s 10th Anniversary event bowled me over. Initially, because of its length. 48 hours? Yeesh. I had no idea that those 48 hours would change my life. But let’s first get a little backstory on my favorite topic: dogs.

Read more