MEDIA GUIDES / Video

Serverless Video Processing for Scalable Media Workflows

Scaling video features shouldn’t mean babysitting servers at 2 AM. Traditional video workflows force us to provision infrastructure, manage queues, and monitor processing jobs around the clock, and that’s before we’ve written a single feature. Serverless video processing handles all of that behind the scenes so we can focus on building our product.

We’ll walk through how serverless architecture handles video transformations, transcoding, and delivery. Cloudinary’s platform makes these workflows easy to implement with APIs that scale on demand.

Key Takeaways:

  • Serverless video processing removes the hassle of infrastructure management and scales automatically
  • Common tasks like transcoding, resizing, and thumbnail generation happen on-demand
  • Upload presets and webhooks automate entire video workflows

In this article:

What Serverless Video Processing Means in Practice

With serverless video processing, we upload videos directly to a cloud service, and processing kicks off based on the transformations we choose. There’s no provisioning, no queue management, and no worrying about how the system will handle growth from our end.

Events are what drive everything in serverless workflows. When we upload a video, that single event can trigger multiple processing jobs simultaneously; generating different resolutions, extracting thumbnails, and transcoding to different formats. We choose what we want, and the platform handles the orchestration.

Here’s an example of how serverless video upload works in practice:

// Upload video directly to Cloudinary without server infrastructure
function uploadVideoToCloudinary(videoFile) {
    const formData = new FormData();
    formData.append('file', videoFile);
    formData.append('upload_preset', 'my_video_preset');
    formData.append('cloud_name', 'demo');


    // Upload directly to Cloudinary's API - no backend needed
    $.ajax({
        url: 'https://api.cloudinary.com/v1_1/demo/video/upload',
        type: 'POST',
        data: formData,
        processData: false,
        contentType: false,
        success: function(response) {
            console.log('Video uploaded:', response.public_id);
            displayProcessedVideo(response);
        }
    });
}

This code shows us the core of a serverless video processing workflow:

  • We build a FormData object with our video file and reference an upload preset that we’ve already configured in Cloudinary’s dashboard.
  • processData: false and contentType: false settings are important here because they tell jQuery not to interfere with our multipart form data, which we need to do for file uploads to work properly.

The main thing to notice is that there’s no backend server in this flow. We’re uploading straight from the browser to Cloudinary’s API endpoint. Once the upload succeeds, we get back a response with a public_id that we can use to reference and transform the video from that point forward.

Common Video Processing Tasks in Serverless Workflows

Media workflows normally need to cover a few common tasks every time we use them. Transcoding converts videos between different formats and codecs so they play on a wide range of devices like smartphones, tablets, and computer screens. Resizing adjusts video dimensions to fit different screen sizes without negatively affecting the quality.

Clipping grabs specific parts of videos for us to preview or highlight, and thumbnail generation captures still frames at specific timestamps to create preview images. All of these tasks can be applied on-demand without pre-processing every possible variation.

Here’s how we create different video transformations serverlessly:

// Apply on-demand video transformations
function createVideoTransformations(publicId) {
    const cloudName = 'demo';
    const baseUrl = `https://res.cloudinary.com/${cloudName}/video/upload`;


    // Each transformation is processed on-demand
    const transformations = {
        // HD version (1080p)
        hd: `${baseUrl}/w_1920,h_1080,c_fill,q_auto,f_mp4/${publicId}`,


        // Mobile-optimized version (360p)
        mobile: `${baseUrl}/w_640,h_360,c_fill,q_auto:low,f_mp4/${publicId}`,


        // Thumbnail at 2 seconds
        thumbnail: `${baseUrl}/so_2.0,w_400,h_225,c_fill,f_jpg/${publicId}`,


        // Auto quality MP4
        optimized: `${baseUrl}/q_auto,f_mp4/${publicId}`
    };


    return transformations;
}

These transformations cover most of what our common video processing needs. The hd version sets our dimensions to 1920×1080 with c_fill to crop intelligently and q_auto to let Cloudinary pick the best quality setting. The mobile version drops to 640×360 and uses q_auto:low to keep file sizes small for slower connections.

The thumbnail transformation is useful for video previews; so_2.0 grabs a still frame at the 2-second mark and outputs it as a JPG. Everything here happens on-demand, so we only process what gets requested instead of pre-generating every possible variation.

How Serverless Architectures Handle Video at Scale

Serverless architectures scale based on actual demand instead of pre-allocated capacity. Upload one video, and the system allocates resources for that job. Upload a hundred at once, and the infrastructure expands to match; we don’t need to configure anything ourselves.

Processing jobs get distributed across different workers behind the scenes. The platform manages queues, assigns jobs, and handles retries if anything fails. We get to skip the usual DevOps headaches of monitoring queue depths and adjusting worker counts.

Here’s how a serverless architecture handles batch video processing:

// Process multiple videos concurrently
async function processBatch(videoFiles) {
    console.log(`Processing ${videoFiles.length} videos...`);


    // Upload all videos concurrently
    // Cloudinary scales automatically to handle the load
    const uploadPromises = videoFiles.map(file => 
        uploadVideoToCloudinary(file)
    );


    const results = await Promise.all(uploadPromises);
    console.log(`Successfully processed ${results.length} videos`);
    return results;
}

This code shows us how to handle 10 videos or 10,000 easily. We use videoFiles.map to kick off uploads for every file simultaneously, and Promise.all waits for all of them to finish before moving on. The infrastructure stretches to meet demand; we just write the logic and let it run.

Building a Basic Serverless Video Processing Pipeline

A serverless video pipeline has three main parts: upload, processing, and delivery. We accept videos from users and store them securely. Processing applies transformations based on our configuration. Delivery serves processed videos through a global CDN, so viewers get content from the closest geographic location.

Here’s an example of a complete serverless video pipeline:

// Let's build a serverless video processing pipeline
class ServerlessVideoPipeline {
    constructor(cloudName, uploadPreset) {
        this.cloudName = cloudName;
        this.uploadPreset = uploadPreset;
        this.baseUrl = `https://res.cloudinary.com/${cloudName}/video/upload`;
    }


    // Step 1: Upload our video
    async uploadVideo(file) {
        const formData = new FormData();
        formData.append('file', file);
        formData.append('upload_preset', this.uploadPreset);


        const response = await fetch(
            `https://api.cloudinary.com/v1_1/${this.cloudName}/video/upload`,
            { method: 'POST', body: formData }
        );
        return await response.json();
    }


    // Step 2: Create a transformation pipeline
    createPipeline(publicId) {
        return {
            hd: this.buildUrl(publicId, 'w_1920,h_1080,c_fill,q_auto,f_mp4'),
            mobile: this.buildUrl(publicId, 'w_640,h_360,c_fill,q_auto:low,f_mp4'),
            thumbnail: this.buildUrl(publicId, 'so_2.0,w_400,h_225,c_fill,f_jpg')
        };
    }
}

This pipeline class ties together everything we’ve covered so far. The constructor takes our cloudName and uploadPreset as parameters so we can reuse the same pipeline across different projects. The uploadVideo method handles the serverless upload using fetch instead of jQuery; this keeps it framework-agnostic.

The createPipeline method generates our transformation URLs on the fly. Videos go into Cloudinary’s storage, transformation URLs are generated instantly, and processed versions become available through the CDN. We just connect the pieces and let it run.

How Cloudinary Enables Serverless Video Processing

Cloudinary gives us enterprise-grade video processing through simple APIs. We skip the usual setup; no encoding servers, no storage bucket configuration, and no CDN distribution headaches. We define transformations as URL parameters or upload configurations, and Cloudinary processes videos on-demand.

Here’s how Cloudinary’s Upload Widget simplifies serverless video processing:

// Initialize Cloudinary Upload Widget
function initializeCloudinaryWidget() {
    const widget = cloudinary.createUploadWidget({
        cloudName: 'demo',
        uploadPreset: 'video_preset',
        resourceType: 'video',


        // Automatic transformations on upload
        eager: [
            { width: 1920, height: 1080, crop: 'fill', format: 'mp4' },
            { width: 640, height: 360, crop: 'fill', format: 'mp4' }
        ]
    }, (error, result) => {
        if (result.event === "success") {
            console.log('Video uploaded:', result.info.public_id);
            // All transformations are ready
        }
    });
    return widget;
}

The Upload Widget gives us a ready-made UI for video uploads without building our own interface. The eager array is where the real power is; those transformations start processing as soon as the upload completes, so all versions are ready by the time we need them. We get an HD and a mobile version generated automatically without writing any server-side video processing code.

Automating Video Workflows With Cloudinary

Upload presets configure processing rules once, and apply them consistently to every upload. We define transformations, quality settings, and delivery options in the Cloudinary dashboard, then reference the preset name during upload. This ensures consistent processing without hardcoding transformation logic in our application.

Webhooks tell our application when processing is done, or if an event occurs. We can trigger custom workflows based on upload completion, transformation readiness, or other video lifecycle events. This means reliable processing without us needing to set up constant polling or manual status checks.

Here’s how automated workflows work with upload presets:

// Upload with automated transformations using upload presets
function uploadWithAutomatedProcessing(videoFile) {
    const formData = new FormData();
    formData.append('file', videoFile);
    formData.append('upload_preset', 'auto_video_processing');


    $.ajax({
        url: 'https://api.cloudinary.com/v1_1/demo/video/upload',
        type: 'POST',
        data: formData,
        success: function(response) {
            // All transformations are already processing
            response.eager.forEach(transformation => {
                console.log('Generated:', transformation.secure_url);
            });
        }
    });
}

Upload presets automate our entire workflow. We configure transformations once in the Cloudinary dashboard, then every upload automatically triggers the same processing pipeline. The response.eager array gives us back URLs for each transformation we defined in the preset; we loop through them and each secure_url points to a ready-to-use processed version. This gives us consistent results without managing transformation logic in our code.

Scaling Video Processing Across Applications

High-volume video workloads need infrastructure that scales without taking a performance hit. Whether we’re processing dozens or thousands of videos simultaneously, Cloudinary distributes jobs across its infrastructure and handles the load balancing for us.

Delivery stays fast thanks to Cloudinary’s global CDN network. Processed videos cache at edge locations worldwide, keeping latency low for viewers everywhere. Scalable processing plus global delivery means we can handle traffic spikes without panicking.

Here’s how to handle high-volume video processing:

// Serverless video processing at scale
async function processWithProgress(videoFiles, onProgress) {
    const total = videoFiles.length;
    let completed = 0;


    for (const file of videoFiles) {
        const result = await uploadVideo(file);
        completed++;


        onProgress({
            completed,
            total,
            percentage: Math.round((completed / total) * 100)
        });
    }
}

This approach processes videos sequentially while keeping our application informed of progress. The onProgress callback fires after each upload completes, passing back a simple object with the count and percentage. This is useful for building progress bars or status dashboards in our UI. Cloudinary’s infrastructure handles the heavy lifting on each upload while we track the overall batch status from our side.

Build Scalable Video Pipelines Without Servers

Serverless video processing lets us build complete workflows from start to finish. We upload our files and Cloudinary handles transformations and delivery for us; all without the operational overhead of managing our own infrastructure.

Cloudinary makes all of this accessible through straightforward APIs. Upload presets keep processing consistent, webhooks let us build custom integrations, and the platform scales from one video to thousands without us lifting a finger. We get enterprise-grade capabilities without the enterprise-grade barriers of entry.

If we’re ready to build scalable video workflows, we can sign up with Cloudinary for a free account and get started.

Frequently Asked Questions

What Is Serverless Video Processing?

Serverless video processing handles video transformations, transcoding, and delivery without requiring us to manage server infrastructure. We upload videos and specify transformations through APIs, and the cloud platform handles all processing automatically. This approach eliminates server provisioning, scaling worries, and infrastructure maintenance.

How Does Cloudinary Handle Video Transformations Serverlessly?

Cloudinary processes video transformations on-demand through URL-based parameters or upload configurations. When we request a transformed video URL, Cloudinary generates that version if it doesn’t exist, caches it, and serves it via CDN. Subsequent requests for the same transformation are served instantly from cache without reprocessing.

Can Serverless Video Processing Handle High-Volume Workloads?

Serverless architectures excel at handling different workloads because they scale automatically when demand increases. Cloudinary’s infrastructure processes one video or thousands simultaneously without manual intervention. The platform distributes processing jobs across its infrastructure and delivers results through a global CDN that handles any traffic volume reliably.

QUICK TIPS
Tali Rosman
Cloudinary Logo Tali Rosman

In my experience, here are tips that can help you better manage serverless video processing for scalable media workflows:

  1. Design for idempotency from the first upload event
    Serverless pipelines often retry jobs after transient failures, duplicate webhook deliveries, or timeout recoveries. Make sure every processing step can run more than once without generating duplicate derivatives, broken states, or repeated billing.
  2. Separate ingest-time transforms from view-time transforms
    Not every video variant deserves eager generation. Precompute only the assets you know you will need often, then let less common renditions happen on demand so storage and processing costs do not balloon unnecessarily.
  3. Use content-aware routing for heavy jobs
    A short UGC clip and a 90-minute 4K source should not follow the same processing path. Route files by duration, resolution, bitrate, or business priority so expensive workloads do not clog fast-turnaround queues.
  4. Track source quality before transcoding begins
    Many delivery issues blamed on encoding are actually caused by poor mezzanine inputs, variable frame rates, clipped audio, or inconsistent color spaces. Validate source files early so downstream failures are easier to diagnose and prevent.
  5. Generate a mezzanine strategy, not just delivery formats
    Teams often focus only on final MP4 or HLS outputs, but a stable mezzanine asset makes reprocessing far easier when codecs, devices, or requirements change. Keeping one strong intermediate version saves time during future migrations and optimization passes.
  6. Treat thumbnails and preview clips as product surfaces
    In many workflows, the first still frame or short preview determines whether a video gets played at all. Optimize those assets deliberately instead of generating them as an afterthought, especially for catalogs, feeds, and editorial archives.
  7. Watch cold-start effects in adjacent functions, not just media APIs
    The media platform may scale well while your own webhook handlers, auth layers, or database writes become the real bottleneck. Measure the whole event chain so “serverless” does not hide latency introduced by surrounding services.
  8. Attach business metadata at ingest, not after delivery
    Rights windows, language, campaign IDs, and approval states are much more useful when they enter the pipeline with the asset. Early metadata lets you automate routing, retention, and publishing logic before manual exceptions pile up.
  9. Use failure buckets instead of one generic retry loop
    Codec errors, corrupt uploads, timeout failures, and policy rejections should not all be retried the same way. Classifying failure types lets you retry only the recoverable jobs and escalate the ones that need human review.
  10. Model cost by processing pattern, not by file count alone
    Ten thousand short clips may be cheaper than a few long, multi-rendition masters with aggressive preview generation and repeated re-encodes. Cost forecasting gets much more accurate when you measure duration, rendition count, and reprocessing frequency together.
Last updated: Mar 30, 2026