Originally posted 12/14/16 on TeradataVoice by David Mueller, Teradata
The Great Barrier Reef, a world heritage site of remarkable underwater beauty on the north-east coast of Australia, is a paradise for diving professionals and scuba enthusiasts. And there I was - 50km off the coast of Northern Queensland - a data scientist on a two-week summer break standing on the slippery backside of a chartered vessel in an overly-tight wetsuit, about to take my first step into deep water.
Glasses, check. Oxygen, check. Weight belt tightened, vest inflated, and no sharks in sight… go!
Exploring deep waters
As it turns out, diving is the perfect sport for data-loving professionals because data collection is an essential part of the diving routine. Slowly descend into the ocean depths and you could imagine yourself taking that first dip into an unexplored data lake. And looking for the biggest fish or most colourful coral among thousands of living creatures is like searching for hidden insights in large-scale datasets.
But depth isn’t just relevant to deep-sea diving. As a concept, it’s increasingly important to analytics and machine intelligence. Artificial Neural Networks, computational models of interconnected processing elements - Artificial Neurons - loosely inspired by the human brain, have been around for decades. These networks, historically composed of one or two hidden layers of neurons, evolved from shallow structures to deep architectures - aka Deep Learning. This transformation was aided and abetted by a decline in the cost of building large and parallel models at scale, and an increase in availability of the big, multi-dimensional datasets needed to train deep models.
- Deep Learning - a branch of machine learning
Deep Learning is essentially a branch of machine learning. But, while standard machine learning usually requires time-consuming manual extraction of model input from data (or ‘feature engineering’), Deep Learning models excel at automatically capturing the complex structural patterns hidden in massive datasets. This ability enabled recent advancements in a range of analytical tasks on image, video, audio, and text data - data where manual extraction of features soon reaches its limit.
- Deep Learning - not just for tech powerhouses
So far, exclusive to large academic institutions and tech powerhouses, the recent increase in media coverage of Deep-Learning-related achievements (often driven by highly-tangible research outcomes - game-playing computers, artificial creative machines, etc.), has significantly contributed to the Deep Learning buzz. And now other companies are jumping into their wetsuits to explore the deep. Reports of data scientists applying Deep Learning methods to more widely-established areas of analytics across industries such as customer churn prediction, financial fraud detection and product recommendation, testify to the growing interest of traditional businesses.
- Deep Learning - a trade-off
Businesses need to understand that diving into deep analytics involves a trade-off. While the accuracy of deep network architectures can rival / outperform existing approaches - even on well-structured datasets – the resulting models are opaque black boxes in need of fine tuning. Deep Learning often requires up-front architecture investment to handle model complexity. Large neural networks are known to take several hours or even days to train, which can hamper data-science teams if not anticipated and planned for.
Choosing the right model for an analytical challenge is not just a case of maximising model accuracy subject to restrictions imposed by the input dataset. When it comes to traditional analytics, model interpretability and simplicity often outweigh smaller gains in accuracy. Dealing with big datasets, the performance and scalability of a modelling approach need to be taken into consideration. The requirements of non-operational models assisting insight discovery will differ from fully-operational modelling architectures.
David Mueller is a Senior Data Scientist in the International Data Science Centre of Excellence at Teradata. Based in Singapore he is supporting local teams across the APAC region as an expert in the application of advanced statistical and analytical methods to the solution of business problems across industries.