Anatomy of an Outage: Auth0

Auth0 suffered a major outage on June 21. It was part of a wider issue with Cloudflare, their cloud edge provider.

Their status page tracks the problems reported, which now states the issue happened at 06:27 UTC. But their Twitter feed (@auth0Status) first reported the issue at 07:05 AM – 38 minutes later.

[status] Investigating: We are currently investigating a service disruption with high error rates and timeouts. More information will be provided shortly https://t.co/rB6FyGdeIw

— Auth0 Status (@auth0status) June 21, 2022

Enter Serinus

Our Serinus monitors are lightweight HTTPS calls that verify a domain is available. They check for two metrics – failing APIs – HTTP calls that either failed to connect or connected and returned a 5XX server error and slow calls – calls that take longer than expected for DNS lookup or to make a TCP connection.

Note: instead of reporting a failure rate (e.g. 4%), we report the inverse – the availability (e.g. 96%). Our incident monitor looks at a rolling 15-minute window of results to be an early warning system for major issues with public APIs. We use our global network of agent locations and report if one specific cloud or region is worse affected than other regions.

Our monitoring first categorized the Auth0 APIs as having an issue at 06:29 UTC, when the availability dropped to 98%. These first issues were spotted in North America and from our IBM locations.

By 06:33 UTC, we raised our categorization of the problem to a minor outage – under 75% availability – and it was affecting all regions and clouds.

Well, that escalated quickly

This escalated to a Major Outage (under 50% availability) at 06:38 and a Critical Outage (under 25% availability) by 06:44. This state continued until 07:11, then recovered back to a minor outage at 07:21, and then to of concern at 07:27 and finally the incident was resolved at 07:57.

The initial period of escalating outage involved failing APIs – HTTP calls that either failed to connect or connected and returned a 5XX server error.

In recovery

During the recovery period, from about 06:45 UTC, some API calls started passing but responded slowly, at a rate of about 1.5% of calls.

Between 06:45 and 07:00 European and IBM locations were affected more than other locations.
Between 07:06 and 07:11 Azure and Asian locations were worst affected.
Between 07:14 and 07:51 API calls from our Google locations were affected more than other locations.

The regions that were worst affected changed during this time, starting first in Asia, then Oceania, and finally North America.

The lowest availability for a 15-minute window was 17.52% between 06:30 and 06:45, but it took until 07:10 at the earliest to improve noticeably.

Summary of events

Concern: 06:29 – 06:32
Minor: 06:33 – 06:36
Major: 06:37 – 06:42
Critical: 06:43 – 07:10
Major: 07:11 – 07:19
Minor: 07:20 – 07:26
Concern: 07:27 – 07:55

Total incident time: 86 minutes

Sign up to learn more about Serinus!

Serinus is currently in a closed Beta but leave your details and we will get back to you. You can also follow Serinus on Twitter.

Request A Demo

Find A Slot To See A Demo Or Speak To One Of Our Support Specialists

Ready To Start Monitoring?

Want to learn more? Check out our technical knowledge base, or our sector by sector data, or even our starters guide to the API economy. So sign up immediately, without a credit card and be running your first API call in minutes.

An artistic digital illustration of a cloud floating over a futuristic cityscape.

Join Us Now!

Join the 100s of companies relying on APIContext.

Governance

Reporting

Monitoring

Industries

Use Cases

Persona

Learn More about APIContext

Cloud API Tools

Service Status Tools

API Best Practices

Anatomy of an Outage: Auth0

Enter Serinus

Well, that escalated quickly

In recovery

Summary of events

Sign up to learn more about Serinus!

Share

Request A Demo

Ready To Start Monitoring?

Related Posts

The Clouds Are Not Flat: Key Takeaways from the APIContext 2024 Cloud Service Provider API Quality Report

Are We Ready to Move from APIs as Products to APIs as Utilities?

Ensuring Conformance with Our New Schema Validation Feature

Unlocking the Full Potential of US Open Banking: How APIContext Champions Interoperability

Join Us Now!

Resources

Blog & News

Product Overview

Contact Us

Company