AI-Powered Smart Media Uploader and Optimizer With Cloudinary MCP Server in Next.js

Imagine managing your entire media library not by clicking through folders and forms, but by having a conversation. Instead of manually searching, selecting, and tagging assets, you could simply type: “Find all images from the ‘summer-sale’ folder and tag them with ‘archive-2025’.” This is the future of digital asset management: an intelligent, conversational interface that works like a co-pilot for your creative workflow.

This platform provides a powerful chat interface that understands your commands, manages your media, and even handles new uploads. It’s all powered by a cutting-edge, full-stack setup: a Next.js frontend, OpenAI for natural language understanding, and the Cloudinary Model Context Protocol (MCP) Server acting as the bridge between them.

In this tutorial, you’ll build a complete AI-powered media assistant that can:

Launch a local Cloudinary MCP gateway that exposes powerful asset management tools in a way that AI models can understand.
Provide a polished Next.js chat interface for uploading files and interacting with your media library.
Process natural language commands to list, rename, move, tag, and delete assets in your Cloudinary account.
Integrate with OpenAI to intelligently interpret user requests and call the appropriate Cloudinary tools.

Let’s dive in!

Laying the Foundation: Project Setup and Environment

Before we dive into the code, we need to get the project running on your local machine. This involves cloning the starter repository, installing the necessary packages, and configuring your environment with the required API keys for Cloudinary and OpenAI.

1. Clone the Project

We’ll start by cloning the complete project from GitHub. This gives us the full application structure right away.

git clone https://github.com/musebe/cloudinary-mcp-media-assistant.git

cd cloudinary-mcp-media-assistant
Code language: PHP (php)

2. Install Dependencies

Next, install all the required Node.js packages using npm.

npm install

This will install Next.js, React, the Cloudinary and OpenAI SDKs, and other utilities defined in the package.json file.

3. Configure Environment Variables

The assistant needs to connect to your specific Cloudinary and OpenAI accounts. We’ll store these secret keys in a local environment file that should never be committed to version control.

Create a new file named .env.local in the root of your project and add the following content:

# Get this from your Cloudinary Dashboard homepage
# Format: cloudinary://API_KEY:API_SECRET@CLOUD_NAME
CLOUDINARY_URL="your_cloudinary_url"

# Get this from platform.openai.com/api-keys
OPENAI_API_KEY="sk-..."

# Port for the local MCP server (optional, defaults to 8787)
MCP_PORT=8787

Code language: PHP (php)

CLOUDINARY_URL. This single URL contains your Cloud Name, API Key, and API Secret. You can find it on your main Cloudinary Dashboard.
OPENAI_API_KEY. This is required to give your assistant its “brain.” You can generate a new key from your OpenAI API Keys page. The AI features are optional, but this key is needed to run the app as-is.

Important: Your .env.local file contains sensitive credentials. The project’s .gitignore file is already configured to exclude it, but always ensure you don’t accidentally expose your keys.

With the setup complete, we’re ready to start the engine.

The Engine Room: Launching the Cloudinary MCP Gateway

At the heart of our application is the Cloudinary MCP Gateway. Before we can build a chat interface, we need to run this local server. It acts as a crucial translator, converting Cloudinary’s powerful Asset Management API into a standardized format that AI models can understand and interact with. This format is called the Model Context Protocol (MCP).

Instead of a traditional REST API, the MCP server exposes functions like list-images or delete-asset as “tools” that an AI can be instructed to use.

1. Start the MCP Server

The project includes a custom script to make starting this server simple. It handles configuration, starts the process, and checks that it’s running correctly before finishing.

Open a new, dedicated terminal window and run the following command:

npm run dev:mcp
Code language: CSS (css)

After a few moments, you should see output confirming that the server is active and listening for connections, ending with a success message.

🔹 Starting Cloudinary MCP Gateway...
   Using CLOUDINARY_URL: //***:***@your_cloud_name
   Spawning process...

🩺 Checking gateway health at http://localhost:8787/sse...
✅ Gateway is healthy and running!

Keep this terminal window open. This server must be running for the chat application to function.

2. Understanding the `start-mcp-asset.ts` Script

This isn’t just a simple server command. It’s a robust launcher. Let’s look at two important parts of the scripts/start-mcp-asset.ts file.

First, the core command uses npx to run two packages together.

// scripts/start-mcp-asset.ts (snippet)

const cmd = "npx";
const args = [
  "-y",
  "supergateway",
  // ... port and path flags ...
  "--stdio",
  "npx -y --package @cloudinary/asset-management -- mcp start",
];

const child = spawn(cmd, args, {
  /* ... */
});
Code language: JavaScript (javascript)

Breakdown:

supergateway. Utility that creates an MCP-compliant server from another process.
-stdio '...'. Wraps the standard output of another command.
npx ... mcp start. Runs the official Cloudinary asset management tools. By wrapping it, the gateway discovers all available Cloudinary functions (list-images, upload-asset, asset-rename, etc.) and exposes them as MCP tools.

Next, the script includes a health check to ensure the server is actually ready before our app connects.

// scripts/start-mcp-asset.ts (snippet)

async function checkHealth(): Promise<boolean> {
  const startTime = Date.now();
  while (Date.now() - startTime < HEALTH_CHECK_TIMEOUT) {
    try {
      // The /sse path is the Server-Sent Events endpoint
      const response = await fetch(HEALTH_CHECK_URL, { method: "GET" });
      if (response.ok) {
        return true; // Success!
      }
    } catch {
      // Ignore errors, server is still starting
    }
    await new Promise((resolve) => setTimeout(resolve, HEALTH_CHECK_INTERVAL));
  }
  return false; // Timeout
}
Code language: JavaScript (javascript)

This loop prevents race conditions and ensures a smooth developer experience.

For a deeper dive into how the protocol works, check out the official Cloudinary MCP Documentation.

With our engine running, it’s time to build the cockpit.

Building the Cockpit: Crafting the Conversational UI in Next.js

With our MCP server running, we need an interface for our conversation. The “cockpit” of our application is a clean chat UI built with Next.js, responsible for displaying messages and capturing user input.

1. The Main Hub: `ChatContainer.tsx`

This component orchestrates the UI, primarily by managing and displaying the list of messages. Its main role is to map over the message state and render the appropriate components for each one.

// src/components/chat/chat-container.tsx (Conceptual Snippet)
// The container's core job is to render the list of messages.
<ScrollArea>
  {optimisticMessages.map((m) => (
    <MessageBubble role={m.role}>
      {/* Renders text, AssetLists, etc. inside */}
    </MessageBubble>
  ))}
</ScrollArea>;
Code language: PHP (php)

See the full component with state management on GitHub.

2. Visualizing Media: `AssetList.tsx`

When the assistant returns media assets, this component renders them in a rich list. Its most important job is displaying a thumbnail and key information for each asset.

// src/components/chat/asset-list.tsx (Snippet)
// A simplified view of how an asset is displayed
<div className="flex items-start gap-3">
  <Image src={item.thumbUrl} alt="..." width={56} height={56} />
  <div className="font-medium" title={item.id}>
    {item.id}
  </div>
</div>;
Code language: HTML, XML (xml)

See the full component on GitHub.

3. Capturing User Input: `ChatInput.tsx`

This form handles both text and file inputs. The key is a visually hidden input type="file" that is triggered by a button click, providing a clean UI for two types of actions.

// src/components/chat/chat-input.tsx (Snippet)
// The mechanism for dual text/file input
<input ref={fileInputRef} type="file" className="hidden" />
<Button onClick={() => fileInputRef.current?.click()}>
  <Paperclip />
</Button>
<Input placeholder="Type a message or upload..." />
Code language: HTML, XML (xml)

See the full component on GitHub.

With the user interface in place, we now need to wire it up to our backend logic.

Opening the Comms Channel: Connecting the UI to the Backend With Server Actions

Now that we have a UI and a server, we need to connect them. Instead of building traditional API routes, we’ll use a modern Next.js feature: Server Actions. These are special functions that run on the server but can be called directly from our client components, making form submissions and data mutations simple and secure.

1. Wiring Up the Action in the UI

In our ChatContainer.tsx component, we use the useActionState hook from React. This hook is designed to work seamlessly with Server Actions.

// src/components/chat/chat-container.tsx (Snippet)
"use client";
import { useActionState } from "react";
import { sendMessageAction } from "@/app/(chat)/actions";
// ...

export function ChatContainer() {
  const [messages, formAction, isPending] = useActionState(
    sendMessageAction, // Our Server Action
    [] // The initial state (an empty message list)
  );

  // ...
}
Code language: JavaScript (javascript)

This line does three things:

messages. Provides the updated list of chat messages returned by the action.
formAction. Exposes a function to trigger the action, which we pass to ChatInput.
isPending. Returns a boolean loading state, used to show a “typing” bubble while the server processes the request.

See the full implementation on GitHub.

2. Defining the Server Action

The sendMessageAction function lives in src/app/(chat)/actions.ts. The file starts with a 'use server'; directive, which enables the feature. The function accepts form data and the previous state from the hook.

// src/app/(chat)/actions.ts (Snippet)
'use server';

import type { ChatMessage } from '@/types';

export async function sendMessageAction(
  previousState: ChatMessage[] | null,
  formData: FormData
): Promise<ChatMessage[]> {
  // 1. Get user input from formData.
  const text = formData.get('text') as string;
  const file = formData.get('file');

  // 2. Connect to the MCP server.
  // 3. Call the correct tool based on the input.
  // 4. Return the new message list.

  // ... The full implementation follows ...
}
Code language: HTML, XML (xml)

This function is the central hub of the application’s logic. It reads the FormData object from the client, determines whether the user uploaded a file or typed a command, and then calls the MCP server.

See the full action file on GitHub.

Now that the communication channel is open, we can implement the logic for handling specific user commands.

Handling Directives: Processing Text Commands With MCP Operations

Our Server Action is the central command hub. Now we’ll implement the logic that interprets user commands and translates them into actions for our MCP server to execute. This involves a three-step process: parse intent, call an operation, and execute the MCP tool.

1. Parse User Intent With Regular Expressions

Inside sendMessageAction, we use regular expressions to understand the user’s text. This is a fast and effective way to handle command-line-style instructions.

// src/app/(chat)/actions.ts (Snippet)
// A few examples of the intent-matching regex
const wantList = /^(list|show)\s+images/i.test(text);
const renameMatch = text.match(/^rename\s+(.+?)\s+to\s+(.+)$/i);
const deleteMatch = text.match(/^delete\s+(.+)$/i);
const tagMatch = text.match(/^tag\s+(.+?)\s+with\s+(.+)$/i);

Code language: JavaScript (javascript)

Based on which expression matches, we enter a specific block of logic to handle that command.

See the full list of matchers in the action file on GitHub.

2. Handle the Command and Prepare a Response

Here’s the handler for the list images command. It calls a dedicated operation function (listImages) and then uses the result to build the assistant’s response.

// src/app/(chat)/actions.ts (Snippet)
// This block runs if the `wantList` regex matches
if (wantList) {
  // 2a. Call the dedicated operation function
  const assets = await listImages(client);

  // 2b. Build the assistant's reply object
  assistantMsg = {
    id: crypto.randomUUID(),
    role: "assistant",
    text: assets.length ? "Here are your latest images:" : "No images found.",
    assets: assets || undefined, // This data is sent to the AssetList component
  };
}
Code language: JavaScript (javascript)

If assets are returned, they’re attached to the assets property. The frontend AssetList component is designed to automatically render this data.

3. Execute the MCP Tool Call

The final step happens inside the operation functions in src/lib/mcp-ops.ts. These functions handle direct communication with the MCP server. The listImages function looks like this:

// src/lib/mcp-ops.ts (Snippet)
export async function listImages(client: MCPClient): Promise<AssetItem[]> {
  // 3a. Call the specific tool by name
  const res = await client.callTool({ name: "list-images", arguments: {} });

  if (res.isError) {
    throw new Error("list-images failed");
  }

  // 3b. Parse the raw JSON into our standardized AssetItem type
  return toAssetsFromContent(res?.content || []) || [];
}
Code language: HTML, XML (xml)

Here, client.callTool({ name: 'list-images' }) sends the command to the MCP Gateway, which executes the corresponding Cloudinary function. The raw JSON response is then parsed by toAssetsFromContent into a clean format for the UI.

Other commands like rename, tag, and delete follow the same pattern, keeping intent parsing and tool execution separate.

See all operation functions on GitHub.

Intelligent Uploads: A File Uploader That Responds

Handling file uploads in a chat interface requires a smooth flow from the browser to the cloud. Our assistant doesn’t just upload a file; it processes it via the MCP server and immediately responds with the resulting asset, creating an interactive experience.

Let’s follow the journey of a file from the user’s click to the final response.

1. The Trigger: User Selects a File

It starts in the ChatInput.tsx component. When the user selects a file, the input’s onChange event fires, which immediately calls the onSend function from the parent container.

// src/components/chat/chat-input.tsx (Snippet)
function handleFileChange(event: React.ChangeEvent<HTMLInputElement>) {
  const file = event.target.files?.[0];
  if (file) {
    onSend({ file }); // Kicks off the entire upload process
  }
}
Code language: JavaScript (javascript)

This action packages the File object and sends it straight to our Server Action.

See the full component on GitHub.

2. The Action: Handling the File on the Server

Our sendMessageAction checks for a file before looking for text commands. This gives uploads priority.

// src/app/(chat)/actions.ts (Snippet)
// This block is at the top of our action's logic
if (file instanceof File) {
  // Call our dedicated upload operation
  const uploaded = await uploadFileToFolder(client, file, "chat_uploads");

  // Build a response message containing the new asset
  assistantMsg = {
    id: crypto.randomUUID(),
    role: "assistant",
    text: "Image uploaded successfully.",
    assets: uploaded ? [uploaded] : undefined, // Send asset data back to the UI
  };
  return [...currentState, userMessage, assistantMsg];
}
Code language: JavaScript (javascript)

After calling uploadFileToFolder, the function returns an assistantMsg that includes the uploaded asset. This lets the UI show the result instantly.

3. The Operation: Sending the File to Cloudinary via MCP

The last step is in src/lib/mcp-ops.ts. The uploadFileToFolder function prepares the file and calls the correct MCP tool. Files can’t be sent directly as JSON, so we first encode them as base64 data URIs.

// src/lib/mcp-ops.ts (Snippet)
export async function uploadFileToFolder(
  client: MCPClient,
  file: File
): Promise<AssetItem | null> {
  // 1. Convert the file into a text-based data URI
  const buffer = Buffer.from(await file.arrayBuffer());
  const dataUri = `data:${file.type};base64,${buffer.toString("base64")}`;

  // 2. Call the 'upload-asset' tool with the data URI
  const res = await client.callTool({
    name: "upload-asset",
    arguments: {
      uploadRequest: {
        file: dataUri,
        fileName: file.name,
        folder: "chat_uploads",
      },
    },
  });

  // 3. Parse the JSON response from Cloudinary
  return parseUploadResult(res?.content) || null;
}
Code language: JavaScript (javascript)

client.callTool({ name: 'upload-asset' }) instructs the MCP gateway to upload the file. The gateway handles Cloudinary communication and returns the new asset’s details, which we parse and send back to the user.

See the full upload operation on GitHub.

The AI Brain: Integrating OpenAI for True Natural Language Processing (Optional)

Our regex-based command handler is fast and effective for specific commands, but it’s rigid. If a user types “show me my pictures” instead of “list images,” our current logic fails. To make the assistant truly smart, we can integrate an OpenAI model to understand natural language and decide which MCP tool to use.

This transforms the application from a command-line interface into a conversational assistant.

1. The Limitation of Regex vs. The Power of AI

The difference lies in intent detection:

Regex looks for an exact pattern.
AI understands meaning. For example, “can you get rid of the picture named ‘test’?” can be mapped to the delete-asset tool, something regex cannot reliably do.

2. The AI Router: `askOpenAIWithMCP`

The logic is in src/lib/ai-router.ts. The idea is to present the entire MCP server to OpenAI as a single tool that the model can use.

The OpenAI SDK supports type: 'mcp' for this purpose.

// src/lib/ai-router.ts (Snippet)

// Describe our MCP server to the OpenAI client
const mcpTool = {
    type: 'mcp',
    server: { type: 'sse', url: 'http://localhost:8787/sse' },
} as ResponsesTool;

// Make the API call
const resp = await openai.responses.create({
    model: 'gpt-4o', // Or your preferred model
    input: [
        { role: 'system', content: 'You are a Cloudinary asset assistant. Prefer calling MCP tools...' },
        { role: 'user', content: userText },
    ],
    tools: [mcpTool],
    tool_choice: 'auto',
});

Code language: JavaScript (javascript)

What happens here:

Define mcpTool, pointing the SDK to the MCP gateway.
Call openai.responses.create with a system prompt, the user’s message, and the MCP tool definition.
The model analyzes the text and, if needed, calls the correct MCP tool.
The API response contains both a text reply and any data from the tool call.

See the full AI router implementation on GitHub: src/lib/ai-router.ts

3. Activating the AI Brain

To switch from regex to AI, modify the else block in src/app/(chat)/actions.ts. Instead of returning a help message, let the AI handle text intent.

// src/app/(chat)/actions.ts (Conceptual Upgrade)
import { askOpenAIWithMCP } from "@/lib/ai-router";

// ... inside sendMessageAction ...

if (file instanceof File) {
  // Keep direct handling for file uploads
  // ...
} else {
  // Let the AI router handle ALL text-based intents
  const { text: replyText, assets } = await askOpenAIWithMCP(text);
  assistantMsg = {
    id: crypto.randomUUID(),
    role: "assistant",
    text: replyText,
    assets: assets,
  };
}
Code language: JavaScript (javascript)

With this change, the assistant gains far greater flexibility and intelligence.

Adding a Human Touch: Using a Secondary AI for Friendlier Chatbots (Optional)

Functionality is key, but the user experience is what makes an application feel great. Regex-based commands return accurate but robotic responses like “Deleted my-image-id.”. We can improve this by adding a second, lightweight AI call whose job is to make the assistant sound more human.

This is a powerful pattern: use one system (regex or a large AI model) to decide what to do, and a second, faster AI model to decide how to say it.

1. The Problem: Static, Hardcoded Responses

In actions.ts, the response text is constructed programmatically.

// A typical response from our regex-based system
assistantMsg = {
  // ...
  text: `Created folder “${folderPath}”.`,
};
Code language: JavaScript (javascript)

This is clear, but not conversational.

2. The Solution: An AI-Powered Copywriter

The src/lib/ai-guide.ts file introduces a helper function, generateFriendlyReply. This function takes the hardcoded default text and uses a fast AI model (like gpt-4o-mini) to rewrite it in a warmer, more helpful way.

The improvement comes from carefully designed prompts.

System prompt sets the tone:

// src/lib/ai-guide.ts (System Prompt Snippet)
function buildSystemPrompt() {
  return [
    "You are a warm, concise product guide for a Cloudinary MCP chat.",
    "Tone: friendly, encouraging, not robotic.",
    "Keep answers short (1–3 sentences).",
    "Never invent features.",
    // ...
  ].join(" ");
}
Code language: JavaScript (javascript)

User prompt gives context:

// src/lib/ai-guide.ts (User Prompt Snippet)
function buildUserPrompt(input: GuideInput) {
  return [
    `User said: "${input.userText}"`,
    `Assistant’s default reply (must keep meaning): "${input.defaultText}"`,
    "Rewrite the default reply to be more conversational and helpful...",
  ].join("\n");
}
Code language: JavaScript (javascript)

This ensures the AI rephrases the default reply without losing accuracy.

See the full prompt design on GitHub: src/lib/ai-guide.ts

3. Integrating the Friendly Replies

To activate this feature, wrap static text assignments in actions.ts with a call to the helper.

Before: Hardcoded response.

// src/app/(chat)/actions.ts (Original)
if (createFolderMatch) {
  const folderPath = createFolderMatch[2].trim();
  const ok = await createFolderOp(client, folderPath);
  assistantMsg = {
    // ...
    text: ok
      ? `Created folder “${folderPath}”.`
      : `I couldn’t create “${folderPath}”.`,
  };
}
Code language: JavaScript (javascript)

After: Conversational response.

// src/app/(chat)/actions.ts (With AI Guide)
import { generateFriendlyReply } from '@/lib/ai-guide';

if (createFolderMatch) {
  const folderPath = createFolderMatch[2].trim();
  const ok = await createFolderOp(client, folderPath);

  const defaultText = ok ? `Created folder “${folderPath}”.` : `I couldn’t create “${folderPath}”.`;
  const friendlyText = await generateFriendlyReply({
      userText: text,
      defaultText: defaultText,
      intent: 'create-folder',
  });

  assistantMsg = { /* ... */, text: friendlyText };
}

Code language: JavaScript (javascript)

With this change, a message like “Created folder “marketing”.” becomes “All set! The “marketing” folder has been created for you.” This small shift makes the assistant feel much more human.

Conclusion

You’ve successfully built a sophisticated AI-powered media assistant. By combining a reactive Next.js frontend with the powerful tooling of a Cloudinary MCP Server, you’ve created a conversational interface that can intelligently manage digital assets. This architecture is more than just a proof-of-concept; it’s a blueprint for the future of media management tools.

You now have a robust system that can be extended in many exciting ways.

AI-Powered Smart Media Uploader and Optimizer With Cloudinary MCP Server in Next.js

Laying the Foundation: Project Setup and Environment

1. Clone the Project

2. Install Dependencies

3. Configure Environment Variables

The Engine Room: Launching the Cloudinary MCP Gateway

1. Start the MCP Server

2. Understanding the `start-mcp-asset.ts` Script

Building the Cockpit: Crafting the Conversational UI in Next.js

1. The Main Hub: `ChatContainer.tsx`

2. Visualizing Media: `AssetList.tsx`

3. Capturing User Input: `ChatInput.tsx`

Opening the Comms Channel: Connecting the UI to the Backend With Server Actions

1. Wiring Up the Action in the UI

2. Defining the Server Action

Handling Directives: Processing Text Commands With MCP Operations

1. Parse User Intent With Regular Expressions

2. Handle the Command and Prepare a Response

3. Execute the MCP Tool Call

Intelligent Uploads: A File Uploader That Responds

1. The Trigger: User Selects a File

2. The Action: Handling the File on the Server

3. The Operation: Sending the File to Cloudinary via MCP

The AI Brain: Integrating OpenAI for True Natural Language Processing (Optional)

1. The Limitation of Regex vs. The Power of AI

2. The AI Router: `askOpenAIWithMCP`

3. Activating the AI Brain

Adding a Human Touch: Using a Secondary AI for Friendlier Chatbots (Optional)

1. The Problem: Static, Hardcoded Responses

2. The Solution: An AI-Powered Copywriter

3. Integrating the Friendly Replies

Conclusion

Further Reading and Future Enhancements

Start Using Cloudinary

Products

Solutions

Developers

Company

Contact Us

Continue Reading

Start Using Cloudinary