by Bill Franks, Teradata
Anybody paying attention to big data has probably run across a certain powerful and exciting observation that’s often made: the one about how today’s vast ocean of diverse and multi-structured data allows analytics to discover unexpected patterns and insights for “answers to questions we never thought to ask”, etc.
I hesitate to label this kind of language as hype, because it is absolutely true that analytics makes such opportunistic moments of discovery possible. But, I think it’s worth remembering how much of today’s analytics still involve proactively asking questions and structuring inquires up front. In other words, what about all those questions we did think to ask?
Indeed, the way we frame our analytic inquiries at the start makes all the difference, and requires us to make accurate assumptions. This is all the more important as algorithmic intelligence increasingly becomes more automated and autonomous. When potentially millions of decisions will be made automatically, being even slightly off base when designing an analytics process can lead to serious consequences. So, here are a few guidelines to test assumptions and shore up confidence in our analytics:
1) Know When Not to Split Hairs:
Particularly when using analytics to track conditions over time or forecast future circumstances, we should test our assumptions about what kinds of changes rise to the level of being important or consequential. To do this, we can leverage a discipline called sensitivity analysis. It’s a technique often used in engineering, and not enough in analytics.
Consider a group of executives stuck debating whether the inflation rate will be three, four or five percent over the next decade; they simply can’t reach agreement. Sensitivity analysis can come to the rescue by showing how results will be impacted as the inflation rate varies. If, regardless of which executive is most accurate, the analytics still point to the same answer on a strategic business matter, then the executives don’t have to bother trying to agree on an exact assumption. In this instance, testing assumptions involves testing how precise you need to be about them.
2) Choose the Right Variables
One side effect of having a bigger volume and variety of data: Along with more avenues to value, you also get more dead ends; we need to be able to tell the difference between the two. With ever more data to look at in ever more combinations, it’s easy to go down paths that don’t make sense, or to find spurious correlations that are a distraction more than a reality.
When testing only a few factors, chances are low that some completely irrelevant variable will be mistakenly classified as significant. Think about the petabytes of sensor data generated by a modern airplane. Even for a single event like an engine overheating, you have potentially tens of thousands of metrics available to correlate with that event.
If you’re testing 20,000 factors, can expect 200 or so “bogus” factors slip through the net as being statistically significant even when utilizing a 99 percent confidence rate in your analysis. My point here is that judgment is required to decide which variables should be fed into an analysis – so that only reasonable candidates are included.
3) Document Your Assumptions for Easier Troubleshooting
As we make decisions and judgment calls in targeting the right variables and metrics, it’s just as important to make sure we document those decisions. Let’s add some context to the airplane analogy to see why:
Beginning in 2012, the Boeing Company suffered through more than a year of headlines and headaches as its new 787 Dreamliner planes experienced electrical system problems with its lithium-ion batteries. While some people criticized Boeing for not catching the problem during extensive testing in the production phase, the fact is that there is simply not enough human and computing power to anticipate every potential problem. Boeing may have had very good logic behind the decisions regarding what risks were large enough to warrant extensive testing and which were not given the practical constraints present. Being able to explain the logic behind such decisions can help diffuse concerns if your organization has a problem.
As I mentioned previously, it’s necessary to decide what metrics and factors to look at, and what to leave unexamined. But, the extent to which we document such decisions makes it easier to go back and troubleshoot. We can retrace our steps and examine more fully the steps not taken. In the process, we may be able to assign new importance to something that once seemed irrelevant and then give it much more focus in the future. A needle in a haystack, after all, is almost impossible to find. But, once found, analytics can ensure it stays in sight.