First Utility is a leading energy supplier in the United Kingdom. They were having issues with the quality of the APIs that were powering their mobile apps. The challenge they faced was simple. They had extensive testing and monitoring including API Gateway monitoring, Selenium based web test tools and Splunk based logs. But they were hitting performance issues. And they were taking significant time to resolve and identify. The company implemented APImetrics. And within days, First Utility was able to resolve problems that had been impacting users for weeks. Even better, they were able to stop intra-company finger pointing. “Rather [...]
70% of all API problems have no easy way to identify root cause In a survey of 20 of the leading corporate infrastructure APIs, we found that in over 70% of the API problems, there was no clear root cause in the cluster of poor performance. We used our Machine Learning system to learn the normal performance of each of the APIs. This included services from Docusign, Microsoft and Dropbox. We looked for periods where the performance degraded. We then clustered the events that seem to be linked or related (again using machine learning techniques). Then we looked [...]
It's come to our attention that we had a significant issue affecting data collection from our remote agents over the 4th of July weekend. We've traced the problem to our services bus that connects the remote agents to our data store. This has been rectified and we are taking steps to ensure this type of event can't happen again, unfortunately, it will have resulted in a weekend of lost data from the different collection points. Calls made from our default server will be unaffected. If you have any additional questions please don't hesitate to contact us.
We're in the process of rolling out some new analytics tools and we've been looking at some headline numbers. The first headline number is 83,000,000 - that's how many API calls we've run since we started APImetrics and, because we believe in learning from data, that's how many records we have in our database. The second number, which is what surprised us, of that number about 1,600,000 are out and out failures - that is the API returned, for some reason, a 5XX error. We have a much higher rate of 4XX errors, but they could be related to token [...]
An article including a number of things we've discovered over the last year and very handy for anybody who is starting to try and figure out what a Service Level Agreement (SLA) actually means for an API, a micro-service or, for that matter, the cloud. We're going to be expanding our feature set around SLA monitoring and we'll keep you informed of the status over the next few months. Enjoy this one, it's an excellent read from CIO.com.
API Time Travel We frequently check general performance data to look for 'odd' responses and we found an interesting one today which involved an API call on a test server which took -284,027ms, or just a little under 5 minutes. We assume that the host had to have a clock reset in the middle of making the API call, but it was an interesting results and one which has led to a small change on our side to error out such platform induced time issues in the future. Once again, this makes it clear to us that just [...]
As one of our feature enhancements we've improved the way our agents work, we are now consistently capturing key data on the actual API performance including: DNS Lookup time: i.e. 28355µs (28ms) Time to Connect: i.e 76106µs (76ms) Time for Handshake: i.e 0µs (0ms) Upload time: i.e. 48µs (0ms) Processing time: i.e. 120568µs (121ms) Download time: i.e. 2545µs (3ms) However, this improvement has raised two issues. Firstly, we have realized that we had some minor reporting issues with our old collection agents which means that some of the latencies we are recording were actually better than actually were being experienced. [...]
Open government initiatives and a move to use SaaS and Cloud services is changing how governments use and access APIs. In New Zealand, the Ministry of Business and Innovation provides access to a range of government services targeting investors and businesses across New Zealand and the world. MBIE has found APImetrics to be a very successful tool. The alerting and performance monitoring capabilities have given very useful information to operational teams. They use APImetrics to ensure that the performance of their services is optimized worldwide.
Twitter had an extremely rare outage today, as our recent social media report showed, they're one of the most reliable services we measure. But they do have outages and given the range of services plugged into the APIs it affected people enough that the news outlets noticed: BBC Coverage here So, were you ready for it? The problems seem to have started around 13:00UTC (5am Pacific) and started clearing up around 15:00. They were intermitent and we'll publish a more detailed review of the anatomy of this outage and the identifying characteristics later. From our initial insights, the US was [...]
We'll have more details out shortly but there's a lot of features and changes coming in the next few days, once we're through the final testing. These include, but are not limited to: System wide variables with the ability to set different production environments Improved data visualizations with new graphing options and improved heatmaps Global SLA settings View all deployments from a single view We now support a full range of webhook integrations to services like Pager Duty and Github and we have a general set of webhooks that can be used for all integrations Watch this space!