Imagine a web where every image is instantly understandable to everyone, including the 2.2 billion people globally with vision impairments. This is a massive engineering challenge. Developers often view writing descriptive alt text as a manual chore that slows down release cycles.
This platform solves that crisis by integrating accessibility into your workflow, rather than a manual, extra step. Powered by a high-performance TanStack Start foundation and Cloudinary AI v5, it generates, saves, and streams accessible metadata for every asset the moment it hits the pipeline.
In this tutorial, you’ll architect a self-healing accessibility system that:
- Bootstraps a type-safe foundation using TanStack Start’s full-stack framework.
- Ingests assets via a professional Pipeline Entry Point using the Cloudinary Upload Widget.
- Uses server-side scripts to commit AI captions to permanent storage.
- Orchestrates metadata via type-safe server functions to deliver a WCAG-compliant gallery.
To build an autonomous pipeline, we’ll first need a type-safe, full-stack environment.
We’ll use TanStack Start, a framework that allows us to move heavy metadata orchestration to the server while keeping the UI responsive and inclusive.
We’ll start by scaffolding the project using the official CLI.
This gives us a structured environment with built-in support for server functions and advanced routing.
# Create a new project with React and TypeScript
npm create @tanstack/start@latest auto-inclusive-web
Code language: CSS (css)
After choosing the default options, install the core dependencies for Cloudinary integration.
npm install cloudinary
Because we’ll build a professional ingestion point, we’ll need the Cloudinary Upload Widget available globally.
We’ll inject the script directly into the head of our src/routes/__root.tsx file.
This ensures that every page in our app has the power to deploy inclusive assets.
// src/routes/__root.tsx (snippet)
export const Route = createRootRoute({
head: () => ({
scripts: [
{
src: 'https://upload-widget.cloudinary.com/latest/global/all.js',
type: 'text/javascript',
},
],
}),
shellComponent: RootDocument,
});
Code language: JavaScript (javascript)
Managing secrets is a critical “gotcha” in full-stack apps.
We’ll split our environment variables into two categories.
-
VITE_prefixed. Public keys (like your cloud name and upload preset) that the browser needs to initialize the Upload Widget. -
Server-only. Private keys (like your
CLOUDINARY_API_SECRET) that remain on the server to handle secure Admin API calls.
# .env (local configuration)
VITE_CLOUDINARY_CLOUD_NAME="your_name"
VITE_CLOUDINARY_UPLOAD_PRESET="auto_inclusive_preset"
# PRIVATE: Only accessible in TanStack Server Functions
CLOUDINARY_API_KEY="your_key"
CLOUDINARY_API_SECRET="your_secret"
Code language: PHP (php)
The “smarts” of our pipeline come from the Cloudinary Add-ons marketplace.
Before moving to the code, you must activate the following in your Cloudinary Console:
- AI Content Analysis. This is the v5 engine that writes our natural language descriptions.
- Google Translation. Optional but recommended for localizing your inclusive metadata for global audiences.
To give our pipeline “sight,” we’ll configure the Cloudinary ecosystem. This is where we activate the AI models and create the rules that govern our autonomous metadata generation.
Before writing code, you’ll need to “hire” the AI agents that will process your assets. Navigate to the Cloudinary Add-ons page in your console and register for:
- Cloudinary AI Content Analysis. The powerhouse v5 engine. It doesn’t just tag “dog” or “tree”; it generates full natural-language captions (e.g., “A golden retriever puppy playing with a red ball in the grass”).
- Google Translation. This add-on allows us to take that AI caption and instantly localize it, ensuring our inclusive metadata reaches a global audience.
We don’t want to pass complex AI instructions from the client-side for every upload. Instead, we’ll create an Upload Preset (let’s call it auto_inclusive_preset). This acts as a predefined instruction manual for Cloudinary.
- Settings > Upload > Add Upload Preset.
- Signing mode. Set to Unsigned. This allows our TanStack Start frontend to upload directly to Cloudinary without a secure server-side signature for every file.
-
Folder. Set to
inclusive_web_assetsto keep our library organized.
Inside the preset, navigate to the Upload Manipulations or Add-ons section (depending on your console version). This is the “Aha!” configuration:
- AI Content Analysis. Enable “Add AI captioning to your image”.
-
Google Auto Tagging. Set a confidence threshold (e.g.,
0.7) to categorize your assets automatically for SEO. (Optional)
One common hurdle identified is that AI data in unsigned uploads can be ephemeral. To ensure the AI caption is permanently saved to the asset’s metadata, we’ll use an On-success script.
In the Advanced tab of your preset, add this snippet:
// This runs server-side on Cloudinary immediately after a successful upload
// It commits the v5 AI caption to the asset's permanent 'context' field
current_asset.update({
context: {
caption: e.upload_info?.info?.detection?.captioning?.data?.caption
}
});
Code language: JavaScript (javascript)
By shifting this logic to the Cloudinary Edge, we’ll ensure that every asset is born with its descriptive metadata committed before our gallery even knows it exists.
In this section, we’ll address the biggest engineering roadblock when working with browser-based uploads and AI: The Metadata Disappearing Act.
When using Unsigned Uploads (which are necessary for simple, client-side widgets), Cloudinary restricts certain parameters for security. Specifically, you can’t pass the detection or auto_tagging parameters directly from your frontend code.
If you try to trigger AI captioning via a createUploadWidget call in your React component, Cloudinary will ignore the request to prevent malicious users from racking up your AI credits.
To bypass this, we’ll utilize the same Upload Preset as our “Trusted Agent.”
- The problem. The client can’t ask for AI.
- The solution. The client asks to use a preset, and that preset (which lives securely on Cloudinary’s servers) is the one that demands the AI analysis.
This architectural shift ensures that the AI analysis is triggered by your server-side configuration, making it secure and reliable.
Even when the AI runs, the resulting caption is often just a “transient” piece of data in the upload response. If you don’t explicitly tell Cloudinary to save it, then that caption won’t be indexed in your Media Library for future retrieval by your gallery.
By using the On-success script we added to the preset, we’ll perform a “Metadata Commit.”
Before writing the gallery code, verify your pipeline is working:
- Upload an image using your dashboard.
- Open your Cloudinary Media Library.
- Click the image and open the Context or Metadata tab.
- You should see a key named
captionpopulated with a full sentence generated by the AI.
If the caption is there, your “Zero-Touch” pipeline is officially live.
With our AI-enriched assets stored safely in Cloudinary, we’ll now need a bridge to bring that data into our frontend.
In TanStack Start, we’ll use Server Functions (createServerFn) to securely fetch our assets without exposing our API_SECRET to the client.
The Cloudinary Admin API is the only way to retrieve the context metadata we saved in the previous section, and requires your private Secret Key.
If you called this from a standard React component, your keys would be visible in the Network tab of the browser, a massive security risk.
By using createServerFn, TanStack Start ensures the code runs only on the server, acting as a secure proxy.
This is the “brain” of your data layer. We’ll keep the implementation lean by focusing on the parameters that matter most. Notice how we explicitly request context: true? Without this, Cloudinary will omit the AI-generated captions.
// src/utils/gallery-engine.ts (Simplified)
import { createServerFn } from "@tanstack/start";
import { v2 as cloudinary } from "cloudinary";
// Configure the SDK (Server-Side Only)
cloudinary.config({
cloud_name: process.env.VITE_CLOUDINARY_CLOUD_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
});
export const fetchGallery = createServerFn({ method: "GET" }).handler(
async () => {
const result = await cloudinary.api.resources({
type: "upload",
prefix: "inclusive_web_assets/",
context: true, // MANDATORY: This fetches our AI captions
max_results: 50,
});
// Transform raw Cloudinary data into clean UI props
return result.resources.map((asset: any) => ({
publicId: asset.public_id,
url: asset.secure_url,
// Map the buried AI caption to a clean 'alt' field
alt: asset.context?.custom?.caption || "AI Description processing...",
}));
}
);
Code language: JavaScript (javascript)
View the full implementation on GitHub:
src/utils/gallery-engine.ts
When Cloudinary returns your assets, the AI caption isn’t in a top-level alt field. It’s buried inside context.custom.caption.
- The problem. Your UI shouldn’t have to know about Cloudinary’s deep JSON structure.
- The solution. We’ll “map” the data inside the server function.
This keeps our frontend components clean and focused only on rendering.
Now, you can use this function in your Gallery route.
TanStack Start makes this feel like a standard React hook, but it’s actually performing a type-safe network call to your server.
// src/routes/gallery.tsx (snippet)
export const Route = createFileRoute('/gallery')({
loader: () => fetchGallery(),
component: GalleryComponent,
});
function GalleryComponent() {
const assets = Route.useLoaderData(); // Fully typed array of our mapped assets!
// ... render the inclusive gallery
}
Code language: JavaScript (javascript)
Building an autonomous pipeline requires a UI that signals its purpose. We don’t just want a “file input”; we want a professional Pipeline Entry Point that indicates assets are being ingested into an AI workflow.
To make the intent unmistakable, we use the universal UI language for uploads. A large, dashed-border container with high-contrast icons. This creates an immediate mental model for the user and is where files go to be processed.
// app/components/UploadWidget.tsx (Simplified snippet)
export function UploadWidget({ onUploadSuccess }: UploadWidgetProps) {
const [isOpening, setIsOpening] = useState(false);
const openWidget = () => {
setIsOpening(true);
// @ts-ignore - Cloudinary is attached to window via root script
const widget = window.cloudinary.createUploadWidget(
{
cloudName: import.meta.env.VITE_CLOUDINARY_CLOUD_NAME,
uploadPreset: import.meta.env.VITE_CLOUDINARY_UPLOAD_PRESET, // Uses Section 3 settings
sources: ["local", "url", "camera"],
},
(error, result) => {
if (!error && result.event === "success") {
onUploadSuccess(result.info);
}
setIsOpening(false);
}
);
widget.open();
};
return (
<button
onClick={openWidget}
className="w-full border-2 border-dashed border-slate-200 rounded-[2.5rem] p-12 hover:border-indigo-400 hover:bg-slate-50 transition-all"
>
<UploadIcon className="w-10 h-10 mb-8" />
<h3 className="text-2xl font-black text-slate-900 uppercase">
{isOpening ? "Connecting..." : "Deploy Inclusive Asset"}
</h3>
<p className="text-indigo-600 font-bold text-xs uppercase tracking-widest mt-2">
Click to browse or drag and drop
</p>
</button>
);
}
Code language: JavaScript (javascript)
View the full implementation on GitHub:
src/components/UploadWidget.tsx
One major “UX Gotcha” is the delay between clicking a button and the widget appearing.
By using an isOpening state, we:
- Disable the button to prevent double-initialization.
- Change the text to “Initializing AI Pipeline” or “Connecting…”.
- Provide visual feedback that the “handshake” between your app and Cloudinary is happening.
Rendering an inclusive library displays images and presents AI-generated metadata as an integral part of the UI. This is where we turn raw data into a human-centric experience.
Once our server function delivers the mapped assets, we’ll render them using a semantic and accessible grid. The key is ensuring the alt text is correctly applied to the <img> tag.
// app/routes/gallery.tsx (Simplified snippet)
function GalleryComponent() {
const assets = Route.useLoaderData();
return (
<div className="grid grid-cols-1 gap-12">
{assets.map((asset) => (
<div
key={asset.publicId}
className="flex flex-col md:flex-row gap-8 items-center bg-white rounded-[2rem] p-8 border border-slate-50 shadow-xl"
>
{/* 1. Optimized Visual Asset */}
<div className="w-full md:w-[45%] aspect-[4/3]">
<img
src={asset.url.replace(
"/upload/",
"/upload/f_auto,q_auto,c_pad,ar_4:3,b_white/"
)}
alt={asset.alt} // The AI-generated mission-critical metadata
className="w-full h-full object-contain rounded-2xl"
/>
</div>
{/* 2. Metadata Context */}
<div className="w-full md:w-[55%] space-y-4">
<span className="text-[10px] font-black text-indigo-600 uppercase tracking-widest">
Autonomous Caption
</span>
<p className="text-xl font-serif italic text-slate-700 leading-relaxed">
"{asset.alt}"
</p>
<div className="flex gap-2">
<TechBadge label="WCAG 2.1 AA" />
<TechBadge label="Cloudinary AI v5" />
</div>
</div>
</div>
))}
</div>
);
}
Code language: JavaScript (javascript)
View the full implementation on GitHub:
src/routes/gallery.tsx
By architecting this “Zero-Touch” pipeline, we’ve moved accessibility from an item on a checklist to an immutable part of the asset lifecycle.
The current pipeline sets the stage for even deeper global inclusivity. By layering the Google Translation Add-on into our on_success script, we can instantly localize these AI captions into dozens of languages.
An image uploaded in Kenya is instantly accessible to a screen-reader user in Tokyo, localized in Japanese, without a single human intervention. That’s the power of an auto-inclusive web. Ready to try this build for yourself? Sign up for a free Cloudinary account today.
- Live implementation: auto-inclusive-web.vercel.app/gallery
- Full source code: github.com/musebe/auto-inclusive-web