by Marc Clark, Teradata
In the early days of Big Data, around two years ago, data scientists were technology heroes, weaving prototypes from their laptops by combining skills from across a dizzying range of disciplines. Bitly’s chief scientist Hilary Mason famously referred to these people as the “awesome nerds,” spinning magic before our eyes from a massive data universe.
Just a short time later, we see cloud-based solutions making big data analytics accessible enough to fulfill an important role in the day-to-day affairs of a huge diversity of organizations. One only has to thumb through a report like “Analytics in Action” to see big data analytics providing answers to the types of questions that all leaders consider as they wrestle with how to derive value and manage unpredictable growth of data stores.
One of the most exciting emerging trends that cloud-based based analytics enables is the possibility to think smaller. Too much of the excitement about analytics has focused on big data. That type of thinking has put many companies off from exploring their data. The fact is we can find valuable signals in small data as well.
We always hear renowned data science leaders, like Monica Rogati, who celebrate the complexity and gumption required to overcome the interdisciplinary hurdles needed to undertake big data analytics. That was true back in the days when all we had were flint knives and stone hand axes and the goal was to take on the mastodon. Every effort was massive and that meant only going after the biggest game.
The simple fact is that the arrival of new tools means that it is feasible to experiment more, at less cost, and go after a greater diversity of prey. Here are some thoughts about where to look and how to quickly take advantage of a cloud infrastructure to support experimentation.
One thing that the cloud changes is the “who” in who can use analytics. The perception was that you needed to have some data warehouse infrastructure in place and that big data was about scaling from there. The beauty of the cloud is that you can go from nothing to something using an integrated suite of technology that enables you to do experiments without the usual barriers. That fundamentally changes the economics of big data because it is now feasible to qualify your data and see if there are signals there to be found.
Many companies have a lot of data sets and not all are “big.” The value of data isn’t necessarily related to size. An ERP or CRM provides a relatively small data footprint compared to the billions of data points contained in an active web log. Does that mean they are not candidates for big data analysis? Not at all. In fact, those data sets are highly curated and the signal-to-noise ratio is likely to be favorable. They are the perfect candidates for getting started. And, as I mentioned in another post, analytics projects don’t have to be rocket science to be valuable. Sometimes simple steps such as modernizing existing reporting capabilities help us take a giant step forward.
With access to a cloud analytics environment, it is possible to take those smaller data sets and get going quickly using the same technology being applied to the largest data sets. Organizations can start to find the correlations and patterns and then grow organically as analytics gains traction and influences their direction in meaningful ways. Core business applications tend to be under analyzed and are ripe for analytics. External data sets can be added for more depth. For example, something as simple as SharedCount that tracks shares, tweets and likes of any URL your company is promoting can greatly expand insight into marketing promotions.
If you give an analyst a bunch of technology and a big pile of data and tell him to “go find something,” you’re going to have a long, costly road to travel before finding something that matters. Yet that’s the approach we seem to hear about most often. The better, more logical way to start is by surveying the business and unearthing the high value questions that no one seems to be able to answer. Then start looking for data sets that could answer them. In other words, start with people and questions, choose some relevant data, and let the cloud go to work for you.