by Cheryl Wiebe and Jeff Cohen, Teradata
Getting started with IoT means getting your hands dirty with the data
Okay, okay….enough with the high level platitudes. Enough with the vacuous statements about how IoT is growing, it’s big now, it’s going to be really big in a little while, and soon, a little while later, it’s going to be way bigger. Talk, talk, talk, and little action. How, you are asking yourself, can I actually get started? What should I do during the 18-24 months while everyone noodles away on the right protocol, how we will protect our widgets from getting hacked, and what’s the perfect IoT platform, gateway device.
Here are some actual things you can do that have proven to be successful with leading Industrial IoT giants, without a fully baked operational IoT environment set up. You can get in there, get your hands on some data, and start into the meticulous but unavoidable first steps.
Grab a few batches of raw sensor data and put it somewhere
It’s going to be a hot mess. Did you think this data comes to you in piped, or coma separated form so you could just put it into a spreadsheet? Or JSON? No. It comes in binary, compressed, tar/tarball/ustar/uname-it form (see Wikipedia tar article for a primer). Much of it follows proprietary, legacy device manufacturers’ formats, whose metadata must be extracted before the data payload can be extracted. Now put it somewhere and organized it so you can actually look at it.
Get some; start looking at it. Find out what you’ve got, what matters
Exploration and data norm development are two iterative processes
So you did all that. (Wipe sweat from brow). Are we having fun yet? You can imagine what life would be like if you didn’t have to do step 1 manually.
Now, what do you have? Maybe you have 20 channels of data, from 1 to 20 sensors, coming in these arcane forms. You may have all sorts of time series and spatial series measurements that may be coming in at different frequency rates (1 Hertz = 1 measurement per minute; 60 Hertz = 1 measurement
per second). In raw form they may not be very usable.
Now, wouldn’t it be nice just to run it through a cruncher, a routine that organizes it, prepares it by performing some basic transformations, standardizations, normalizations, and then displays it in a form you can consume? Try this one: Visual Anomaly Prospector (VAP). This is an accelerator that does just this: and outputs visually all the data channels in ways that allow you to easily spot where there appear to be patterns, outliers, and anomalies that can be acted upon.
Start developing some “norms”
With VAP, you were able to see some interesting things in the sensor data, but you’re a busy person, so you don’t want to spend hours combing through hundreds of sensor streams to manually find these data signals and quirks every day. The engineering team for a heavy industrial equipment manufacturer found themselves in a similar boat. Smart people looking at sensor squiggles on screen after screen for little bumps and dips…not the best use of their time. Wanting to put these smart people back to the work of doing smart things, Sensor Data Qualification (SDQ) was developed to perform this rote process of finding anomalies in the sensors for them. Automating the quality of sensor data created a higher quality dataset for their analytics and flagged weird data for further investigation.
Automate data delivery and cleanup
You might be quick to point out that this qualification process is nothing new. It is the typical data-cleaning step completed by any data scientist in the course of their work, so why make a big deal out of it? Well, the big deal is that part about automation. It is easy to take for granted that not everyone wants to be a data scientist (we don’t understand it either), so SDQ was designed to give people a pathway to use data science without having to become full-blown scientists. In the case of the engineers, they had good intuition for the analytics and some programming skills, but they weren’t programmers and didn’t want to become programmers. With SDQ, the only thing they had to do was interact with a GUI to add their rules, and their source data would come out in accordance with those rules. What’s more, SDQ opens up the possibility of using this system of qualification to automatically direct data to different storage locations, alleviating the need of the engineers to play database administrator.
Once the data wrangling is operationalized you’ll want to start to get some real value out of the payload.