Following on from our thoughts on SLOs and KPIs, we teased that we had some more data on CMA9 APIs. And here it is!
The approach we use is simple. We look at past performance and provide benchmarking against similar services doing similar things – or against previous performance. Obviously, making comparisons against similar services is challenging. But we happen to have a large data set, which fits nicely.
Measuring API SLOs
The CMA9 UK Open Banking APIs are a set of APIs required by all the UK major banks. It covers the provision of data on common bank information services – branch locations, ATMs, financial products. These are public APIs, so they’re not using any form of security. This gives us a set of APIs from 9 different, large, blue-chip organizations that all do EXACTLY the same thing.
Our approach moving forward for helping our clients is simple: The past is prologue, or at least, the past is a good indication of what you should be targeting. So we took the results for the month of April 2018 and defined a series of achieved service levels for all the APIs. They availability, failure rate, median and mean latency for the calls.
With this, we can look at the best and worst performers, and see what impact that has on areas like user error and how much more time a typical set of queries takes.
The CMA9 API service levels we expected
What we expected was to see consistent APIs from large providers. That makes sense. These are, at the core, simple search REST APIs that provide similar amounts of data. But the reality was quite different.
The defined SLO for availability based on overall performance for April was your classic 99.999% uptime; actually, the achieved Service Level in April was 100% for some of the services. And, yes, some of the providers were able to meet that – but not all.
In fact, in all 6 CMA9 API examples, less than half were able to meet that Service Level. And the worst performances were MUCH worse: 94% availability against a defined SLO of 99.999%. Or, put another way, out of more than 10,000 calls – roughly the sample we measured – that was about 550 failed calls.
Timings are even worse. We measured the total call time from the same set of cloud providers within the same geography. There were no more than 500 miles (800km) between the hosted services and the cloud providers we called from.
Yet the difference between the fastest median performance and the slowest was, in some cases, as much as a FULL SECOND. That’s 1250ms versus 106ms. Or looking at that another way, over the sample size of 10,000 transactions, that adds up to over three hours more time spent on those transactions. The UK itself isn’t large, so users would rightly expect fast response times – but the geography doesn’t seem to be the issue here.
Servers aren’t free
Let’s dive into that. Server time is not cost free, even if you run your own infrastructure. 10,000 calls per month isn’t a realistic load for a service; even 10,000 calls per day would be low for something like an ATM or branch location service.
Unless you know what those costs are, and what you could be achieving, you don’t know what you should be able to strive for. Having a sample like these and the work we’re planning to do over the next few weeks on being able to show API owners where they sit against similar APIs is going to provide us with even more insight into where your performance issues are really hitting you and, more importantly, how much you could improve things.
We have observed with our CASC scores that only about a third of the APIs meet what we would consider an acceptable quality level of CMA9 APIs. The Service Level data bears that finding out, whether you look at availability or latency, there is a systemic issue here that we did not expect to see with large providers.
Please feel free to contact us about how our measurement works and what we could do for you.
Photo courtesy of Fyn Kynd