Architecting an Auto-Inclusive Web With TanStack Start and Cloudinary AI

Why It Matters

Developers no longer spend hours drafting alt text for bulk uploads.0700
The Cloudinary v5 engine provides a uniform descriptive standard across the entire asset library.
Every image enters the codebase with WCAG 2.1 AA metadata pre-attached, drastically reducing legal and compliance risks.

Imagine a web where every image is instantly understandable to everyone, including the 2.2 billion people globally with vision impairments. This is a massive engineering challenge. Developers often view writing descriptive alt text as a manual chore that slows down release cycles.

This platform solves that crisis by integrating accessibility into your workflow, rather than a manual, extra step. Powered by a high-performance TanStack Start foundation and Cloudinary AI v5, it generates, saves, and streams accessible metadata for every asset the moment it hits the pipeline.

In this tutorial, you’ll architect a self-healing accessibility system that:

Bootstraps a type-safe foundation using TanStack Start’s full-stack framework.
Ingests assets via a professional Pipeline Entry Point using the Cloudinary Upload Widget.
Uses server-side scripts to commit AI captions to permanent storage.
Orchestrates metadata via type-safe server functions to deliver a WCAG-compliant gallery.

Bootstrapping the Full-Stack Foundation

To build an autonomous pipeline, we’ll first need a type-safe, full-stack environment.

We’ll use TanStack Start, a framework that allows us to move heavy metadata orchestration to the server while keeping the UI responsive and inclusive.

Initializing TanStack Start

We’ll start by scaffolding the project using the official CLI.

This gives us a structured environment with built-in support for server functions and advanced routing.

# Create a new project with React and TypeScript
npm create @tanstack/start@latest auto-inclusive-web
Code language: CSS (css)

After choosing the default options, install the core dependencies for Cloudinary integration.

npm install cloudinary

The Document Shell: Global SDK Injection

Because we’ll build a professional ingestion point, we’ll need the Cloudinary Upload Widget available globally.

We’ll inject the script directly into the head of our src/routes/__root.tsx file.

This ensures that every page in our app has the power to deploy inclusive assets.

// src/routes/__root.tsx (snippet)

export const Route = createRootRoute({
  head: () => ({
    scripts: [
      {
        src: 'https://upload-widget.cloudinary.com/latest/global/all.js',
        type: 'text/javascript',
      },
    ],
  }),
  shellComponent: RootDocument,
});

Code language: JavaScript (javascript)

Environment Topology

Managing secrets is a critical “gotcha” in full-stack apps.

We’ll split our environment variables into two categories.

VITE_ prefixed. Public keys (like your cloud name and upload preset) that the browser needs to initialize the Upload Widget.
Server-only. Private keys (like your CLOUDINARY_API_SECRET) that remain on the server to handle secure Admin API calls.

# .env (local configuration)
VITE_CLOUDINARY_CLOUD_NAME="your_name"
VITE_CLOUDINARY_UPLOAD_PRESET="auto_inclusive_preset"

# PRIVATE: Only accessible in TanStack Server Functions
CLOUDINARY_API_KEY="your_key"
CLOUDINARY_API_SECRET="your_secret"

Code language: PHP (php)

Activating the Intelligence Layer

The “smarts” of our pipeline come from the Cloudinary Add-ons marketplace.

Before moving to the code, you must activate the following in your Cloudinary Console:

AI Content Analysis. This is the v5 engine that writes our natural language descriptions.
Google Translation. Optional but recommended for localizing your inclusive metadata for global audiences.

Cloudinary Environment Setup

To give our pipeline “sight,” we’ll configure the Cloudinary ecosystem. This is where we activate the AI models and create the rules that govern our autonomous metadata generation.

Activating the AI Marketplace

Before writing code, you’ll need to “hire” the AI agents that will process your assets. Navigate to the Cloudinary Add-ons page in your console and register for:

Cloudinary AI Content Analysis. The powerhouse v5 engine. It doesn’t just tag “dog” or “tree”; it generates full natural-language captions (e.g., “A golden retriever puppy playing with a red ball in the grass”).
Google Translation. This add-on allows us to take that AI caption and instantly localize it, ensuring our inclusive metadata reaches a global audience.

The Secure Upload Preset

We don’t want to pass complex AI instructions from the client-side for every upload. Instead, we’ll create an Upload Preset (let’s call it auto_inclusive_preset). This acts as a predefined instruction manual for Cloudinary.

Settings > Upload > Add Upload Preset.
Signing mode. Set to Unsigned. This allows our TanStack Start frontend to upload directly to Cloudinary without a secure server-side signature for every file.
Folder. Set to inclusive_web_assets to keep our library organized.

Configuring AI Detection

Inside the preset, navigate to the Upload Manipulations or Add-ons section (depending on your console version). This is the “Aha!” configuration:

AI Content Analysis. Enable “Add AI captioning to your image”.
Google Auto Tagging. Set a confidence threshold (e.g., 0.7) to categorize your assets automatically for SEO. (Optional)

The ‘Gotcha’: On-Success Logic

One common hurdle identified is that AI data in unsigned uploads can be ephemeral. To ensure the AI caption is permanently saved to the asset’s metadata, we’ll use an On-success script.

In the Advanced tab of your preset, add this snippet:

// This runs server-side on Cloudinary immediately after a successful upload
// It commits the v5 AI caption to the asset's permanent 'context' field

current_asset.update({
  context: { 
    caption: e.upload_info?.info?.detection?.captioning?.data?.caption 
  }
});
Code language: JavaScript (javascript)

By shifting this logic to the Cloudinary Edge, we’ll ensure that every asset is born with its descriptive metadata committed before our gallery even knows it exists.

Solving the Persistence Hurdle

In this section, we’ll address the biggest engineering roadblock when working with browser-based uploads and AI: The Metadata Disappearing Act.

The Unsigned Restriction

When using Unsigned Uploads (which are necessary for simple, client-side widgets), Cloudinary restricts certain parameters for security. Specifically, you can’t pass the detection or auto_tagging parameters directly from your frontend code.

If you try to trigger AI captioning via a createUploadWidget call in your React component, Cloudinary will ignore the request to prevent malicious users from racking up your AI credits.

Moving Logic to the Edge

To bypass this, we’ll utilize the same Upload Preset as our “Trusted Agent.”

The problem. The client can’t ask for AI.
The solution. The client asks to use a preset, and that preset (which lives securely on Cloudinary’s servers) is the one that demands the AI analysis.

This architectural shift ensures that the AI analysis is triggered by your server-side configuration, making it secure and reliable.

Why `on_success` is Mandatory

Even when the AI runs, the resulting caption is often just a “transient” piece of data in the upload response. If you don’t explicitly tell Cloudinary to save it, then that caption won’t be indexed in your Media Library for future retrieval by your gallery.

By using the On-success script we added to the preset, we’ll perform a “Metadata Commit.”

Verification: Confirming the Commit

Before writing the gallery code, verify your pipeline is working:

Upload an image using your dashboard.
Open your Cloudinary Media Library.
Click the image and open the Context or Metadata tab.
You should see a key named caption populated with a full sentence generated by the AI.

If the caption is there, your “Zero-Touch” pipeline is officially live.

Building the Metadata Orchestrator

With our AI-enriched assets stored safely in Cloudinary, we’ll now need a bridge to bring that data into our frontend.

In TanStack Start, we’ll use Server Functions (createServerFn) to securely fetch our assets without exposing our API_SECRET to the client.

Why a Server Function?

The Cloudinary Admin API is the only way to retrieve the context metadata we saved in the previous section, and requires your private Secret Key.

If you called this from a standard React component, your keys would be visible in the Network tab of the browser, a massive security risk.

By using createServerFn, TanStack Start ensures the code runs only on the server, acting as a secure proxy.

The Fetch Engine

This is the “brain” of your data layer. We’ll keep the implementation lean by focusing on the parameters that matter most. Notice how we explicitly request context: true? Without this, Cloudinary will omit the AI-generated captions.

// src/utils/gallery-engine.ts (Simplified)
import { createServerFn } from "@tanstack/start";
import { v2 as cloudinary } from "cloudinary";

// Configure the SDK (Server-Side Only)
cloudinary.config({
  cloud_name: process.env.VITE_CLOUDINARY_CLOUD_NAME,
  api_key: process.env.CLOUDINARY_API_KEY,
  api_secret: process.env.CLOUDINARY_API_SECRET,
});

export const fetchGallery = createServerFn({ method: "GET" }).handler(
  async () => {
    const result = await cloudinary.api.resources({
      type: "upload",
      prefix: "inclusive_web_assets/",
      context: true, // MANDATORY: This fetches our AI captions
      max_results: 50,
    });

    // Transform raw Cloudinary data into clean UI props
    return result.resources.map((asset: any) => ({
      publicId: asset.public_id,
      url: asset.secure_url,
      // Map the buried AI caption to a clean 'alt' field
      alt: asset.context?.custom?.caption || "AI Description processing...",
    }));
  }
);
Code language: JavaScript (javascript)

View the full implementation on GitHub: src/utils/gallery-engine.ts

The ‘Gotcha’: Payload Mapping

When Cloudinary returns your assets, the AI caption isn’t in a top-level alt field. It’s buried inside context.custom.caption.

The problem. Your UI shouldn’t have to know about Cloudinary’s deep JSON structure.
The solution. We’ll “map” the data inside the server function.

This keeps our frontend components clean and focused only on rendering.

Consuming the Data in a Route

Now, you can use this function in your Gallery route.

TanStack Start makes this feel like a standard React hook, but it’s actually performing a type-safe network call to your server.

// src/routes/gallery.tsx (snippet)
export const Route = createFileRoute('/gallery')({
  loader: () => fetchGallery(),
  component: GalleryComponent,
});

function GalleryComponent() {
  const assets = Route.useLoaderData(); // Fully typed array of our mapped assets!
  // ... render the inclusive gallery
}
Code language: JavaScript (javascript)

The Pipeline Entry Point

Building an autonomous pipeline requires a UI that signals its purpose. We don’t just want a “file input”; we want a professional Pipeline Entry Point that indicates assets are being ingested into an AI workflow.

Affordance Design: The “Dashed-Zone”

To make the intent unmistakable, we use the universal UI language for uploads. A large, dashed-border container with high-contrast icons. This creates an immediate mental model for the user and is where files go to be processed.

// app/components/UploadWidget.tsx (Simplified snippet)

export function UploadWidget({ onUploadSuccess }: UploadWidgetProps) {
  const [isOpening, setIsOpening] = useState(false);

  const openWidget = () => {
    setIsOpening(true);
    // @ts-ignore - Cloudinary is attached to window via root script
    const widget = window.cloudinary.createUploadWidget(
      {
        cloudName: import.meta.env.VITE_CLOUDINARY_CLOUD_NAME,
        uploadPreset: import.meta.env.VITE_CLOUDINARY_UPLOAD_PRESET, // Uses Section 3 settings
        sources: ["local", "url", "camera"],
      },
      (error, result) => {
        if (!error && result.event === "success") {
          onUploadSuccess(result.info);
        }
        setIsOpening(false);
      }
    );
    widget.open();
  };

  return (
    <button
      onClick={openWidget}
      className="w-full border-2 border-dashed border-slate-200 rounded-[2.5rem] p-12 hover:border-indigo-400 hover:bg-slate-50 transition-all"
    >
      <UploadIcon className="w-10 h-10 mb-8" />
      <h3 className="text-2xl font-black text-slate-900 uppercase">
        {isOpening ? "Connecting..." : "Deploy Inclusive Asset"}
      </h3>
      <p className="text-indigo-600 font-bold text-xs uppercase tracking-widest mt-2">
        Click to browse or drag and drop
      </p>
    </button>
  );
}
Code language: JavaScript (javascript)

View the full implementation on GitHub: src/components/UploadWidget.tsx

Managing the Ingestion State

One major “UX Gotcha” is the delay between clicking a button and the widget appearing.

By using an isOpening state, we:

Disable the button to prevent double-initialization.
Change the text to “Initializing AI Pipeline” or “Connecting…”.
Provide visual feedback that the “handshake” between your app and Cloudinary is happening.

The Inclusive Asset Library

Rendering an inclusive library displays images and presents AI-generated metadata as an integral part of the UI. This is where we turn raw data into a human-centric experience.

Mapping Metadata to the DOM

Once our server function delivers the mapped assets, we’ll render them using a semantic and accessible grid. The key is ensuring the alt text is correctly applied to the <img> tag.

// app/routes/gallery.tsx (Simplified snippet)

function GalleryComponent() {
  const assets = Route.useLoaderData();

  return (
    <div className="grid grid-cols-1 gap-12">
      {assets.map((asset) => (
        <div
          key={asset.publicId}
          className="flex flex-col md:flex-row gap-8 items-center bg-white rounded-[2rem] p-8 border border-slate-50 shadow-xl"
        >
          {/* 1. Optimized Visual Asset */}
          <div className="w-full md:w-[45%] aspect-[4/3]">
            <img
              src={asset.url.replace(
                "/upload/",
                "/upload/f_auto,q_auto,c_pad,ar_4:3,b_white/"
              )}
              alt={asset.alt} // The AI-generated mission-critical metadata
              className="w-full h-full object-contain rounded-2xl"
            />
          </div>

          {/* 2. Metadata Context */}
          <div className="w-full md:w-[55%] space-y-4">
            <span className="text-[10px] font-black text-indigo-600 uppercase tracking-widest">
              Autonomous Caption
            </span>
            <p className="text-xl font-serif italic text-slate-700 leading-relaxed">
              "{asset.alt}"
            </p>
            <div className="flex gap-2">
              <TechBadge label="WCAG 2.1 AA" />
              <TechBadge label="Cloudinary AI v5" />
            </div>
          </div>
        </div>
      ))}
    </div>
  );
}
Code language: JavaScript (javascript)

View the full implementation on GitHub: src/routes/gallery.tsx

Conclusion and Future-Proofing

By architecting this “Zero-Touch” pipeline, we’ve moved accessibility from an item on a checklist to an immutable part of the asset lifecycle.

The current pipeline sets the stage for even deeper global inclusivity. By layering the Google Translation Add-on into our on_success script, we can instantly localize these AI captions into dozens of languages.

An image uploaded in Kenya is instantly accessible to a screen-reader user in Tokyo, localized in Japanese, without a single human intervention. That’s the power of an auto-inclusive web. Ready to try this build for yourself? Sign up for a free Cloudinary account today.

Live implementation: auto-inclusive-web.vercel.app/gallery
Full source code: github.com/musebe/auto-inclusive-web

Architecting an Auto-Inclusive Web With TanStack Start and Cloudinary AI

Why It Matters

Bootstrapping the Full-Stack Foundation

Initializing TanStack Start

The Document Shell: Global SDK Injection

Environment Topology

Activating the Intelligence Layer

Cloudinary Environment Setup

Activating the AI Marketplace

The Secure Upload Preset

Configuring AI Detection

The ‘Gotcha’: On-Success Logic

Solving the Persistence Hurdle

The Unsigned Restriction

Moving Logic to the Edge

Why `on_success` is Mandatory

Verification: Confirming the Commit

Building the Metadata Orchestrator

Why a Server Function?

The Fetch Engine

The ‘Gotcha’: Payload Mapping

Consuming the Data in a Route

The Pipeline Entry Point

Affordance Design: The “Dashed-Zone”

Managing the Ingestion State

The Inclusive Asset Library

Mapping Metadata to the DOM

Conclusion and Future-Proofing

Start Using Cloudinary

Products

Solutions

Developers

Company

Contact Us

Why It Matters

Continue Reading

Start Using Cloudinary