Managing video can feel challenging. Files are way too large, uploads time out, and progress feedback is nonexistent. Most tutorials hand you a library that hides everything, which means when something breaks, you have no idea why.
In this guide, we’ll build a complete video upload and gallery app using plain TypeScript (without a framework or backend) and Cloudinary. By the end, you’ll understand exactly how video moves from a user’s device to a hosted stream that plays in any browser, with AI-generated tags, searchable transcripts, rename, and delete.
- Live Demo: core-js-video-upload-cloudinary.vercel.app
- Full Source: github.com/musebe/core-js-video-upload-cloudinary
| Page | What It Does |
|---|---|
| Upload | Drag-and-drop, chunked upload with live progress, pause/resume/cancel |
| Gallery | Video cards, inline playback, AI tags, searchable transcripts, rename, delete |
A normal file upload sends the whole file in one HTTP request. If your connection drops at 95%, you start over. Chunked uploads slice the file into pieces and send each one individually. Cloudinary stitches them back together server-side.
Chunked upload flow
1. Browser sends chunk 1
Range: 0 MB to 10 MB
Cloudinary responds: { done: false }
2. Browser sends chunk 2
Range: 10 MB to 20 MB
Cloudinary responds: { done: false }
3. Browser sends the final chunk
Cloudinary responds: { done: true, url: "..." }
Result:
Cloudinary combines all chunks into one hosted video.
The secret glue is a single header: X-Unique-Upload-Id. Every chunk carries the same UUID so Cloudinary knows they belong to the same file.
npm create vite@latest core-js-video-upload-cloudinary -- --template vanilla-ts
cd core-js-video-upload-cloudinary
npm install
Code language: CSS (css)
Your package.json scripts:
{
"scripts": {
"dev": "vite",
"build": "tsc && vite build",
"type-check": "tsc --noEmit"
}
}
Code language: JSON / JSON with Comments (json)
Create .env at the root:
VITE_CLOUDINARY_CLOUD_NAME=your-cloud-name
VITE_CLOUDINARY_UPLOAD_PRESET=cld_video_upload
VITE_CLOUDINARY_API_KEY=your-api-key
VITE_CLOUDINARY_API_SECRET=your-api-secret
Vite exposes any variable prefixed VITE_ to the browser via import.meta.env. No backend needed.
Sign up at cloudinary.com. Your cloud name is on the dashboard. Copy it into .env.
Next, create an unsigned upload preset
An upload preset is a saved set of rules that Cloudinary applies to every upload.
- Go to Settings → Upload → Upload presets → Add upload preset.
- Set Signing Mode to Unsigned. This lets the browser upload without exposing your API secret
- Set Folder to
cld_video_upload. - Save and copy the preset name to
.env.
Now for the fun part. Enable AI add-ons, both of which require zero lines of processing code from you.
The first add-on is Google Video Tagging
Every video will be automatically labelled, e.g., food, kitchen, cooking for a cooking video, and soccer, athlete, stadium for a sports clip.
- Go to Add-ons → Google AI Video Labeling → Subscribe.
- Open your upload preset → Google Video Tagging → Enable → Save.
The next add-on is Google Video Transcription
Cloudinary listens to your video and generates captions automatically.
- Go to Add-ons → Google AI Video Transcription → Subscribe.
- Open your upload preset → Auto transcription → Enable both SRT and VTT → Save.
This creates four files alongside every uploaded video: .vtt, .en-US.vtt, .srt, .en-US.srt, and a .transcript JSON, all ready to serve from Cloudinary’s CDN.
Both addons are configured in the preset, not in your code. Cloudinary processes them server-side after the upload lands.
The heart of the app is a ChunkUploader class in src/uploader.ts. Here is the core loop:
// Split the file into 10 MB pieces
const chunks = this._buildChunks(file); // file.slice(start, end)
const uploadId = crypto.randomUUID(); // shared across all chunks
for (const chunk of chunks) {
await this._sendChunk(chunk, uploadId);
}
Code language: JavaScript (javascript)
Each chunk is sent as a multipart/form-data POST:
const form = new FormData();
form.append("file", chunk.blob);
form.append("upload_preset", "cld_video_upload");
xhr.setRequestHeader("X-Unique-Upload-Id", uploadId);
xhr.setRequestHeader(
"Content-Range",
`bytes ${chunk.start}-${chunk.end}/${chunk.totalBytes}`
);
xhr.open("POST", `https://api.cloudinary.com/v1_1/${cloudName}/video/upload`);
xhr.send(form);
Code language: JavaScript (javascript)
Content-Range tells Cloudinary exactly where this piece fits. When the last chunk arrives, Cloudinary assembles the video and returns the full asset metadata, URL, duration, resolution, format, file size.
Networks are unreliable. Each chunk retries up to three times with exponential back-off:
// attempt 0 → wait 500 ms
// attempt 1 → wait 1 000 ms
// attempt 2 → wait 2 000 ms
const delay = baseMs * Math.pow(2, attempt); // 500, 1000, 2000
Code language: JavaScript (javascript)
The upload loop checks a flag between chunks. Pause sets the flag; resume resolves a promise that the loop is blocking on. Cancel rejects it. The user always has full control.
After each chunk, you’ll fire a progress callback:
callbacks.onProgress?.({
currentChunk: i + 1, // "Part 2"
totalChunks, // "of 5"
percentage: 60,
speedBytesPerSec: 2_400_000,
etaSeconds: 12,
});
Code language: CSS (css)
The UI reads these values and updates the bar, speed display, and ETA in real time. This single callback is what turns a frustrating wait into a confident experience.
If you’ve exhausted your AI add-on quota, Cloudinary returns HTTP 420. Rather than failing the whole upload, the engine catches it and automatically restarts as a signed request, no preset, no addon, but the video still lands safely in Cloudinary. The gallery marks it with a clear “quota reached” notice instead of silently breaking. See src/types.ts for the addonLimitReached flag that flows through the whole system.
index.html + src/main.ts handle drag-and-drop, file validation, and progress rendering.
File validation runs before a single byte is sent:
// Accept only video MIME types; reject anything over 5 GB
if (!ACCEPTED_VIDEO_TYPES.has(file.type)) {
return { valid: false, reason: "..." };
}
if (file.size > 5 * 1024 ** 3) {
return { valid: false, reason: "..." };
}
Code language: JavaScript (javascript)
After a successful upload the result section shows the video playing directly from Cloudinary’s CDN, no self-hosting, no encoding pipeline, just a URL.
Every uploaded video is saved to localStorage with its public ID, URL, tags, and metadata. The gallery reads this on load and builds a card for each video. See gallery.html and src/gallery.ts.
Cloudinary transforms any video into a JPEG thumbnail with a single URL change:
secureUrl.replace(
"/video/upload/",
"/video/upload/w_640,h_360,c_fill,so_0,q_auto,f_jpg/"
);
Code language: JavaScript (javascript)
c_fill crops to fit. so_0 grabs frame zero. q_auto picks the optimal quality. f_jpg converts to JPEG. No image processing code anywhere in your project.
When the play button is clicked, the thumbnail hides and the native <video> element takes over. Inject the VTT caption track Cloudinary generated:
const track = document.createElement("track");
track.kind = "subtitles";
track.src = `https://res.cloudinary.com/${cloud}/raw/upload/${publicId}.vtt`;
track.default = true;
videoEl.appendChild(track);
videoEl.play();
Code language: JavaScript (javascript)
The browser’s native CC button now works out of the box.
Tags come back asynchronously (Cloudinary processes them after the upload). A Refresh tags button calls the Admin API to check:
const res = await fetch(
`/api/cloudinary/v1_1/${cloud}/resources/video/upload/${publicId}`,
{
headers: {
Authorization: `Basic ${btoa(`${apiKey}:${apiSecret}`)}`,
},
}
);
const { tags } = await res.json(); // ['sport', 'athlete', 'stadium']
Code language: JavaScript (javascript)
When tags arrive they are saved back to localStorage and rendered as coloured pills on the card.
Opening the transcript panel fetches the .transcript JSON from Cloudinary’s CDN. Each word carries a start_time and end_time:
{
"word": "welcome",
"start_time": 0.4,
"end_time": 0.9
}
Code language: JSON / JSON with Comments (json)
Render each word as a clickable <span>. The video’s timeupdate event fires ~4x per second. You can find the matching word and highlight it:
videoEl.addEventListener("timeupdate", () => {
const t = videoEl.currentTime;
const active = words.find((w) => t >= w.start && t < w.end);
active?.classList.add("tw--active"); // purple highlight
active?.scrollIntoView({ behavior: "smooth", block: "nearest" });
});
Code language: JavaScript (javascript)
Click any word and the video jumps to that moment. It works because Cloudinary did the speech recognition, your code just reads the result.
The gallery has an inline rename form. Clicking Save calls the Cloudinary Admin API via src/cloudinary-admin.ts to move the asset:
// Sign the request (SHA-1 via Web Crypto API, see src/utils.ts)
const signature = await sha1Hex(
`from_public_id=...&to_public_id=...&${apiSecret}`
);
await fetch(`/api/cloudinary/v1_1/${cloud}/video/rename`, {
method: "POST",
body: form, // from_public_id, to_public_id, api_key, signature, timestamp
});
Code language: JavaScript (javascript)
After the main video is renamed we also rename all five supplementary files, the two VTTs, two SRTs, and the transcript JSON, so nothing goes stale:
const suffixes = [".transcript", ".vtt", ".en-US.vtt", ".srt", ".en-US.srt"];
await Promise.allSettled(
suffixes.map((s) => renameRaw(`${oldId}${s}`, `${newId}${s}`))
);
Code language: JavaScript (javascript)
Promise.allSettled means a missing file never blocks the rename.
The Delete button removes the video and every supplementary file from Cloudinary via src/cloudinary-admin.ts, then clears the localStorage entry:
// Delete the video
await fetch(`/api/cloudinary/v1_1/${cloud}/video/destroy`, {
method: "POST",
body: form, // public_id, api_key, signature, invalidate: true
});
// Delete VTT, SRT, transcript, ignore "not found" errors
await Promise.allSettled(suffixes.map((s) => deleteRaw(`${publicId}${s}`)));
Code language: JavaScript (javascript)
invalidate: true tells Cloudinary to purge the CDN cache immediately so the old URL stops working.
A static Vite build works on Vercel with two additions.
Tell Vite about both pages (vite.config.ts):
rollupOptions: {
input: {
main: resolve(__dirname, "index.html"),
gallery: resolve(__dirname, "gallery.html"),
},
}
Code language: CSS (css)
Proxy Admin API calls in production (vercel.json):
{
"rewrites": [
{
"source": "/api/cloudinary/:path*",
"destination": "https://api.cloudinary.com/:path*"
}
]
}
Code language: JSON / JSON with Comments (json)
The proxy avoids CORS errors when the browser calls the Cloudinary Admin API directly. In development, Vite’s server.proxy does the same thing.
Add your four environment variables in the Vercel project settings, push, and you’re live.
| Task | Your Code | Cloudinary |
|---|---|---|
| Reassemble file chunks | Send with X-Unique-Upload-Id |
✓ |
| Generate thumbnail | Build the URL | ✓ |
| Transcode for all devices | Nothing | ✓ |
| Speech-to-text captions | Attach the <track> |
✓ |
| Word-level timestamps | Read the JSON | ✓ |
| AI scene/object labels | Read the tags | ✓ |
| Global CDN delivery | Use the URL | ✓ |
The pattern is consistent: You write the wiring, Cloudinary does the processing.
- Add a folder structure in Cloudinary to organize videos by user or category.
- Use Cloudinary’s transformation URL to generate video previews (animated GIF or short MP4 clip).
- Move the API secret to a serverless function so it never touches the browser.
Ready to start building? Sign up for a free Cloudinary account today to get started.
- Full Source Code: github.com/musebe/core-js-video-upload-cloudinary
- Live Demo: core-js-video-upload-cloudinary.vercel.app
How do chunked uploads improve large video file reliability?
Slicing videos into smaller pieces allows you to upload them sequentially. If a network drops, your application only retries the failed chunk rather than restarting the entire file upload from the beginning.How can I automatically generate captions and transcriptions for uploaded videos?
You can enable the Google AI Video Transcription add-on in your Cloudinary upload preset. Cloudinary automatically processes speech-to-text in the background and delivers standard subtitle files ready for the browser.What is the benefit of using an unique upload identifier in headers?
Passing the X-Unique-Upload-Id header alongside a Content-Range header tells Cloudinary which file the chunks belong to. It allows Cloudinary to seamlessly stitch the pieces together once the final chunk arrives.How do you build an interactive, word-level synchronized transcript in vanilla TypeScript?
You fetch the auto-generated JSON transcript from the CDN and map each word to a timed span. Listening to the video player’s timeupdate event allows you to highlight the active word dynamically in real time.Can I customize video thumbnails on demand without processing libraries?
Yes. Changing the delivery URL string with parameters like w_640, h_360, c_fill, and so_0 allows Cloudinary to instantly capture the first frame, resize it, and output a lightweight image.