It's come to our attention that we had a significant issue affecting data collection from our remote agents over the 4th of July weekend. We've traced the problem to our services bus that connects the remote agents to our data store. This has been rectified and we are taking steps to ensure this type of event can't happen again, unfortunately, it will have resulted in a weekend of lost data from the different collection points. Calls made from our default server will be unaffected. If you have any additional questions please don't hesitate to contact us.
We're in the process of rolling out some new analytics tools and we've been looking at some headline numbers. The first headline number is 83,000,000 - that's how many API calls we've run since we started APImetrics and, because we believe in learning from data, that's how many records we have in our database. The second number, which is what surprised us, of that number about 1,600,000 are out and out failures - that is the API returned, for some reason, a 5XX error. We have a much higher rate of 4XX errors, but they could be related to token [...]
An article including a number of things we've discovered over the last year and very handy for anybody who is starting to try and figure out what a Service Level Agreement (SLA) actually means for an API, a micro-service or, for that matter, the cloud. We're going to be expanding our feature set around SLA monitoring and we'll keep you informed of the status over the next few months. Enjoy this one, it's an excellent read from CIO.com.
API Time Travel We frequently check general performance data to look for 'odd' responses and we found an interesting one today which involved an API call on a test server which took -284,027ms, or just a little under 5 minutes. We assume that the host had to have a clock reset in the middle of making the API call, but it was an interesting results and one which has led to a small change on our side to error out such platform induced time issues in the future. Once again, this makes it clear to us that just [...]
As one of our feature enhancements we've improved the way our agents work, we are now consistently capturing key data on the actual API performance including: DNS Lookup time: i.e. 28355µs (28ms) Time to Connect: i.e 76106µs (76ms) Time for Handshake: i.e 0µs (0ms) Upload time: i.e. 48µs (0ms) Processing time: i.e. 120568µs (121ms) Download time: i.e. 2545µs (3ms) However, this improvement has raised two issues. Firstly, we have realized that we had some minor reporting issues with our old collection agents which means that some of the latencies we are recording were actually better than actually were being experienced. [...]
Open government initiatives and a move to use SaaS and Cloud services is changing how governments use and access APIs. In New Zealand, the Ministry of Business and Innovation provides access to a range of government services targeting investors and businesses across New Zealand and the world. MBIE has found APImetrics to be a very successful tool. The alerting and performance monitoring capabilities have given very useful information to operational teams. They use APImetrics to ensure that the performance of their services is optimized worldwide.
Twitter had an extremely rare outage today, as our recent social media report showed, they're one of the most reliable services we measure. But they do have outages and given the range of services plugged into the APIs it affected people enough that the news outlets noticed: BBC Coverage here So, were you ready for it? The problems seem to have started around 13:00UTC (5am Pacific) and started clearing up around 15:00. They were intermitent and we'll publish a more detailed review of the anatomy of this outage and the identifying characteristics later. From our initial insights, the US was [...]
We'll have more details out shortly but there's a lot of features and changes coming in the next few days, once we're through the final testing. These include, but are not limited to: System wide variables with the ability to set different production environments Improved data visualizations with new graphing options and improved heatmaps Global SLA settings View all deployments from a single view We now support a full range of webhook integrations to services like Pager Duty and Github and we have a general set of webhooks that can be used for all integrations Watch this space!
For our second API Health Report we decided to look at Social Networking APIs, specifically Twitter, Facebook and Tumblr - like with our first report we were interested in how your choice of cloud could impact the results, but unlike the first report, we've also added in a range of global locations so you can see how cloud choices and location of users could impact performance. One of the key takeaways from this report is that there are differences between the clouds and large differences between the regions once you are outside of the United States. If you have customers [...]
We're going to be attending and talking at the API Strategy and Practice conference in Chicago this week, so feel free to get in touch and come and see us. The topic of our presentation will be - performance, what the logs don't tell you? And, not to give anything away, this will focus on the problem that averages or server performance logs will tend to hide things. Firstly, averages can easily fool you into thinking that all is well with your performance, caching in particular can make the actual numbers look significantly better than they actually are, or hide [...]