I remember when Google+ was a thing. And then when it was a thing again. Both those periods were a long time ago. But it was still a shock to see that Google+ is soon to be shuttered for good – and to learn that it was brought low by a humble bug in the Google+ People API.
We’ve said it before: you’ve got to keep on monitoring your APIs once they are in production. In the case of Google+, it looks as though the bug arose through Google+ People API’s interaction with a change in the Google+ codebase.
The Google+ codebase
Now, you might say that that kind of error should have been picked up in regression testing. And maybe it should have been. But obviously the Google+ codebase was large and complex. You just can’t test all possible scenarios before releasing a change into the live environment. And even more so more if there is a backlog of rapid, agile changes being made all the time.
Since you can’t account for all interactions within the codebase, it’s a really good idea to at least run a scheduled test suite against the APIs regularly. You then have an idea of how the API is behaving at any given time.
But in this kind of situation, fuzzing and negative testing really come into their own. In the case of an API like the Google+ People one, you really can’t predict what users are going to chuck at your API – or how, or when.
That makes fuzzing the perfect way of keeping an API on its toes. Can it take whatever is thrown at it? If it can’t you will need to do something about it and quickly.
Google and negative testing
Negative testing, on the other hand, might have allowed Google the issue with the API sooner. It depends on the exact nature of the bug. But the point is that you should try and do things that you aren’t supposed to, like log in to an account for which you don’t have the credentials, or access information information that you are not authorized to see.
This is a case where you want to be seeing a 4xx. If you can login or access the information, then that’s the problem – and a big one.
Of course, we don’t know what kind of monitoring Google was doing. They might have been relying purely on passive monitoring.
This is never a good idea. There are a lot of issues that you will never pick up that way. If the changes pass the regression tests in the development environment, DevOps might be more focused, going forward, on Ops.
Why would people not do continuous testing? Well, if something works or appears to work there can be a natural reluctance to start poking it too hard to see if something snaps. Someone has to pay for the breakages, and a lot of companies have a “shoot the messenger” culture.
But the point that we keep trying to make at APImetrics is that you’re only as good as your weakest link.
If there is something wrong with your APIs, that will probably be found out, and when it is noticed, it will come back and bite you. Then hiding it, or ignoring it isn’t a long term strategy. And, as Google has discovered with the Google+ People API, any short-term saving of effort or money has now been more than cancelled out by the reputational cost of the problem not being discovered in a timely fashion.
So the morale of this story is plainly seen. If you want to have the best APIs, you have to do your damnedest to break them. Anything else is foolish pride and false economy.