Skip to content

Add Motion to Your Pictures

Here at Cloudinary we often talk about visual storytelling. It’s the concept of using images and videos to communicate your message – it’s a new spin on the old adage, “a picture is worth a thousand words”. Although images by themselves are powerful, videos are even more effective. A product page featuring a video is more likely to engage more customers and lead to more conversions.

As video is expensive to produce, we’ve come up with a quick and easy way to turn your images into videos – the zoompan effect! Also known as the Ken Burns effect, this transformation applies zooming and/or panning to an image, resulting in a video or animated GIF (depending on the format you specify). The technique is often used in documentaries, where still photographs are brought to life by slowly zooming in or out of them, and panning across areas of interest.

Children’s storytelling programs use zooming and panning techniques to reveal the illustrations in a book as the narrator reads. Here’s an example of these techniques – me literally visually storytelling, reading one of my favorite books, The Cat in the Hat:

Loading code examples

Delving into the technical details, you’ll see that we created this video simply by adding parameters to the delivery URL (you can do this directly, or using an SDK). The original asset is a JPG image, and we’ve turned it into an MP4 video by specifying mp4 for the extension and using the zoompan effect, which is constructed as follows:

  • du_15 makes the effect last for a duration of 15 seconds.
  • from_() specifies the start information and to_() specifies the end information.
  • In this case, we’re using floats for the x and y parameters, which indicate a percentage of the image dimensions. So 0.0 for both x and y means the top left corner of the image, and 1.0 for both x and y means the bottom right corner of the image.
  • zoom takes a range of 1.0 to 8.0, with 8.0 being the highest level of zoom – so here, we’re zooming out from 4.5 to 1.0.

The voiceover is applied as a separate audio layer with public ID docs/storytelling.mp3. Here’s the syntax for that: l_video:docs:storytelling/fl_layer_apply.

We can also pinpoint areas of interest in an image, using pixels for the x and y position, rather than a percentage value.

Take this image of a model wearing different items of clothing. We can run this through our fashion object detection model, and find the bounding box of each item of clothing.

Annotated image showing fashion items

A simple calculation is all it takes to work out the center point of the bounding boxes and hence the x and y coordinates on which to focus our zoom.

Take the front shoe, for example. The details of its bounding box are:

  • Top left x position: 3307.5
  • Top left y position: 6907.5
  • Width: 922.5
  • Height: 1162.5

So, the center of the box is:

  • x = 3307.5 + (922.5/2) = 3768.75
  • y = 6907.5 + (1162.5/2) = 7488.75

We need to use integers for the coordinates, so rounding up we get x=3769, y=7489.

Let’s start with the video zoomed into that shoe, then pan and zoom out to the pants, over a duration of seven seconds. This is the syntax to use:



While zooming works best with high resolution images, they can be heavy, so it’s good practice to resize the result to an appropriate size for a typical viewing device and apply a quality optimization (c_scale,w_400/q_auto). A smaller video speeds up delivery, and uses less bandwidth.

Here’s the result:

Loading code examples

Now we want to move on up to the pants, the jacket, and the hat. We can do this by concatenating different sections of video using the overlay parameter (l_) and the splice flag (fl_splice):

Loading code examples

In the future, we hope to be able to include objects in the syntax, so you could say something like this, enabling easy automation:


There’s no better way to demonstrate this next feature than with some music by the classical composer, Chopin (pronounced “show pan”).

Occasionally I want to sit down at the piano to try out a piece of sheet music I’ve found online, but I don’t want to waste paper printing it off. I can’t fit my laptop on the piano, and it’s too small to view on my phone.

Here’s an idea for setting a particular viewport size, at a certain zoom level, and panning across the music over a given length of time. This time we’re starting with a PDF and ending up with an MP4. We take the first page (pg_1) and transform it into an image, which we can then crop to a viewport size of 1200 x 600 pixels (pg_1/c_crop,h_600,w_1200,x_0,y_0) to keep only the first line of the manuscript.

Increasing the y coordinate moves the crop further down the page. By simply modifying this value, we can move the focus to different lines.

Here’s the first line delivered as a JPG image:

Loading code examples First line of Chopin’s Nocturne sheet music

So far, we’ve been using the from and to notation for the zoompan effect, but this time we’ll specify the mode and maximum zoom instead. The mode can be set to a predefined action – for example, ztc to zoom into the center of the image, or ofl to zoom out starting from the left (see the full list).

Here we’ll use plr to pan from left to right at the same zoom level (e_zoompan:du_16;mode_plr;maxzoom_1.1). Applying our splicing technique described above, we can now pan across each line of music.

The audio layer (l_video:docs:chopin-nocturne,eo_37.5/e_fade:-2000/fl_layer_apply) gives you the option to listen along. Otherwise, you can mute the video to play the music yourself.

Loading code examples

Another use case for the zoompan effect is to add subtle movement to your website. For example, slow panning across a banner or background makes it more eye-catching and attractive. You could also apply zoompan as a hover effect for thumbnails.

Here’s an example of subtle movement behind a text overlay:

Loading code examples

Let’s return to our storytelling, where an image overlay placed on top of a video can create some cool effects. Watch as Harry Potter’s classmate zooms into space!

Loading code examples

And here’s his Dad’s reaction, created by overlaying a video on a video (both videos transformed from images with zoompan):

Loading code examples

That last one could work well as a GIF, but be aware that GIFs are far heavier than videos. To reduce their weight, you could use fewer frames per second (fps_10). The default is 25. It will make the action less smooth, but this is less important for a GIF.

Loading code examples Dad in shock as GIF

Cloudinary’s zoompan effect can help you become the director of your own motion pictures. This powerful tool can help you guide your audience to exactly what you want them to see and when, and to dazzle them once they get there. Try out the zoompan effect and start telling your own visual stories.

Loading code examples

Back to top

Additional related resources

Featured Post