In a survey of 20 of the leading corporate infrastructure APIs, we found that in over 70% of the performance issues we detected, there was no clear root cause in the cluster of poor performance.
We used our Machine Learning system to learn the normal performance of each of the APIs which included services from Docusign, Microsoft and Dropbox and looked for periods where the performance degraded. We then clustered the events that appeared to be linked or related (again using machine learning techniques) and looked at whether or not there was a clear problem.
In roughly 30% of the problems we saw there would be some clear issue detected that other tools would identify, typically a server failure or some kind of internet access related problem. However, in the remaining cases, no apparent cause was detected. These were significant performance degrades, where the latency of the calls decreased dramatically.
These are hard problems to both detect and solve. Customers will notice and probably complain about such a degrade in performance, but classical monitoring tools will miss them and potentially lead to he said/she said arguments between different groups in an organization or directly with customers.