Skip to content

Pinpointing Errors in Customer Media Assets at Cloudinary

I still remember well my first week as a DevOps at Cloudinary. The year was 2017. Everything was new to me—people, laptop, processes—all of which to become familiar with in short order. A mantra often repeated to me in those days was that, Cloudinary being a SaaS, continuous service uptime is its most important goal.

I soon noticed the tremendous amount of vital background data for the matrices that measured our system’s performance. While granting me access to that data, my manager quoted from the movie Spider-Man: “With great power comes great responsibility.” No way could I have guessed then that those words would resonate in my mind for years to come, as if watching over each and everyone of my keystrokes.

Database

Fast forward a few years, and I’m now on the Customer Success Team with a focus on hatching ways in which to better serve our customers. A while back, it occurred to us that since we’d been monitoring errors as a yardstick of our system’s state, we could track the customer’s state in the same manner, too.

And so was born the open-source project The Sentinel, our customer-monitoring tool by way of following the errors in our customer portfolios of rich media. Gratifyingly, most of the technical infrastructure already existed at the outset. Building that tool merely involved connecting a few software constructs, enriching certain data, and sending the final version to a Slack channel. Step by step, The Sentinel works this way:

Sentinal

1. A lambda function continually queries our log aggregation’s Elasticsearch looking for errors.

2. Once an error is found, The Sentinel cross-references it with our Salesforce platform to obtain the key data that pertains to the customer in question:

  • The customer’s name
  • The customer’s contact details
  • The HTTP error code, e.g., 404, 500, 420, etc.
  • The error message that corresponds to that code, e.g., File not found, General error, Rate limit, etc.
  • Cloudinary’s customer success manager and solutions architect for the account

3. The Sentinel consolidates all that data in a Slack message along with mentions of the Cloudinary team that serves that account, notifying the team members of the error.

The Cloudinary team would then look into the root cause of the error and contact the customer to offer assistance. The process works like a charm!

Reaching out for help

In the horizon are enhancements for The Sentinel, including more interactiveness and customer-facing automations. Do stay tuned.

A final thought: for all that great power does usher in great responsibility, good things happen with the right attitude, worthy aspirations, and superlative expertise. Superhero

Back to top