API Monitoring …and why it isn’t the same as testing
So why API Monitoring? after all, you’ve coded your service, API, app or whatever. It’s up and running. Customers are calling at your door. Life should be easy!
But it isn’t. The server goes out. People are getting hung up when they try to send you information. There are huge lag times. Your potential customers get frustrated and go somewhere else.
More likely than not, what’s happening is that there is a problem with the APIs you are running. If you could solve the problem, you would be retaining customers – rather than having the problem sending them to your competition.
This is where monitoring comes in. Dedicated API monitoring helps you check how your internal and third-party APIs are running. If there is a problem with a given API, it tells you what it is. This allows your team to quickly fix it.
In fact, with proper API monitoring, you should know about the problem way before it impacts your customers at all. So they won’t even notice it.
Doing API Monitoring Right
The best practices for API monitoring are going to vary by industry and can run into some problems internally – like, for example, what do you do if the internal risk teams don’t want you to monitor? (We’ll come back to that in another article…)
However, there are some key principles we go by that we want to share here with some tips on doing it right, even if you don’t use APImetrics.
Services go down when you’re not looking, so having heartbeat monitoring is essential – 24/7 365 days a year, have all of your API endpoints exercised by an external call. It doesn’t have to be high frequency, in fact we suggest to customers that anything from 5-10 times an hour for functional testing and maybe only once per hour for security and OAuth monitoring is enough. But they key thing is this – it needs to be a real call from somewhere outside your stack.
External NOT Internal
It really doesn’t matter if your gateway is up, or your backend is running if an actual person calling the API is getting some kind of error, or, worse, can’t even get to the API itself. So monitor externally, set up real API calls, using real security to real resources that run around the clock. Also the whole world isn’t on AWS, so stop using products that think it is.
Do the little used things as well as the stuff that’s used all the time. You never know when a resource has been turned off without somebody realizing it and you only find out when a customer tries to do something essential for them that they only do once a week. An example from a customer – an expenses reporting API went down and they only realized when people filed their expenses on a Friday, of a holiday weekend.
HTTP Codes Are Not Magic
HTTP 200 All Ok… right? Well, not so much – the dirty secret of the API industry is this. HTTP codes lie. HTTP 200 especially so. Worse offenders are SOAP APIs pretending to be REST APIs where the error is buried in what looks like a pass. This can be tricky and we’ll have a whole article on this, spotting the situation can take some work but it’s well worth it – look our for changes in the size of content returned by the APIs as a first clue – if you usually get 500kb back and occasionally it’s only 100kb or even 2-3 bytes, then you’re possible getting an error or failure.
Did We Say That?
A MUCH overlooked part of API monitoring is what exactly it is you’re monitoring. As the story goes, you see somebody under a street light looking around, you offer help and they say they lost their car keys. You start looking and say where did they last see them, and they say over in the woods. When you stop and ask why they’re looking here, they say, “because the light is here…” – so, part of monitoring is monitoring the service you say you provide NOT what you know you deliver.
API Monitoring Best Practice
We’d suggest the following are the absolute MINIMUM items to monitor on a regular basis AND have alerts set up to tell you when something goes wrong.
- Regular API monitoring heart beat checks from multiple locations
- Set up assertion/condition checks on the returned content to validate that what you get back is what you expected and trigger warnings and alerts on those accordingly
- Hit the production endpoints using the same security as your external users use
- Look at the statistical spread of latency results not just the averages – what are your q95 and q99 values?
- Have alerting set up for failures and issues
- Double check what 4XX errors are telling you – a 4XX error often is ignored as a problem elsewhere in the stack, but with APIs while it might mean you’ve got your security wrong, it could indicated that something outside your control has been shut down – there can be a world of difference in the API space between a 401 – unauthorized and a 404 Not Found
- Monitor that your API Security is doing what you think it is? Going back to 6, if you’re meant to return a 401 error if you hit the API with an invalid Scope or Token, do you?
- Don’t worry, too much, about one off events, look at trends and groups of issues
- SLOW isn’t always as bad as inconsistent – people can cope with slow, you can build it into your service – but if your service is inconsistent then that’s hard for you and end-users
- Look for odd geographical or cloud issues – if you’re on Azure but all your users are on AWS, don’t assume that Amazon and Microsoft are doing all the hard work for you on the networking side of things – look at those performance numbers and take steps to remedy what you see
Here to help
Never forget, we’re here to help – we’ve been involved in API monitoring and performance measurement since before people realized it was a problem and we have the best tools available to deliver the data you didn’t even know you needed.
Take a detailed look
Download a detailed introduction to APImetrics and learn how we are bringing common standards to API monitoring with integrated monitoring, performance assurance and compliance analysis!