Remember: The IoT Is Primarily About Small Data, Not Big

Posted on 16th March 2015 in data, Internet of Things, M2M, management, manufacturing, open data

In one of my fav examples of how the IoT can actually save lives, sensors on only eight preemies’ incubators at Toronto’s Hospital for Sick Children yield an eye-popping 90 million data points a day!  If all 90 million data points get relayed on to the “data pool,” the docs would be drowning in data, not saving sick preemies.

Enter “small data.”

Writing in Forbes, Mike Kavis has a worthwhile reminder that the essence of much of the Internet of Things isn’t big data, but small. By that, he means:

a dataset that contains very specific attributes. Small data is used to determine current states and conditions  or may be generated by analyzing larger data sets.

“When we talk about smart devices being deployed on wind turbines, small packages, on valves and pipes, or attached to drones, we are talking about collecting small datasets. Small data tell us about location, temperature, wetness, pressure, vibration, or even whether an item has been opened or not. Sensors give us small datasets in real time that we ingest into big data sets which provide a historical view.”

Usually, instead of aggregating  ALL of the data from all of the sensors (think about what that would mean for GE’s Durathon battery plant, where 10,000 sensors dot the assembly line!), the data is originally analyzed at “the edge,” i.e., at or near the point where the data is collected. Then only the data that deviates from the norm (i.e., is significant)  is passed on to to the centralized data bases and processing.  That’s why I’m so excited about Egburt, and its “fog computing” sensors.

As with sooo many aspects of the IoT, it’s the real-time aspect of small data that makes it so valuable, and so different from past practices, where much of the potential was never collected at all, or, if it was, was only collected, analyzed and acted upon historically. Hence, the “Collective Blindness” that I’ve written about before, which limited our decision-making abilities in the past. Again, Kavis:

“Small data can trigger events based on what is happening now. Those events can be merged with behavioral or trending information derived from machine learning algorithms run against big data datasets.”

As examples of the interplay of small and large data, he cites:

  • real-time data from wind turbines that is used immediately to adjust the blades for maximum efficiency. The relevant data is then passed along to the data lake, “..where machine-learning algorithms begin to understand patterns. These patterns can reveal performance of certain mechanisms based on their historical maintenance record, like how wind and weather conditions effect wear and tear on various components, and what the life expectancy is of a particular part.”
  • medicine containers with smart labels. “Small data can be used to determine where the medicine is located, its remaining shelf life, if the seal of the bottle has been broken, and the current temperature conditions in an effort to prevent spoilage. Big data can be used to look at this information over time to examine root cause analysis of why drugs are expiring or spoiling. Is it due to a certain shipping company or a certain retailer? Are there re-occurring patterns that can point to problems in the supply chain that can help determine how to minimize these events?”

Big data is often irrelevant in IoT systems’ functioning: all that’s needed is the real-time small data to trigger an action:

“In many instances, knowing the current state of a handful of attributes is all that is required to trigger a desired event. Are the patient’s blood sugar levels too high? Are the containers in the refrigerated truck at the optimal temperature? Does the soil have the right mixture of nutrients? Is the valve leaking?”

In a future post, I’ll address the growing role of data scientists in the IoT — and the need to educate workers on all levels on how to deal effectively with data. For now, just remember that E.F. Schumacher was right: “small is beautiful.”

 

Management Challenge: Lifeguards in the IoT Data Lake

In their Harvard Business Review November cover story, How Smart, Connected Products Are Transforming Competition, PTC CEO Jim Heppelmann and Professor Michael Porter make a critical strategic point about the Internet of Things that’s obscured by just focusing on IoT technology: “…What makes smart, connected products fundamentally different is not the internet, but the changing nature of the “things.”

In the past, “things” were largely inscrutable. We couldn’t peer inside massive assembly line machinery or inside cars once they left the factory, forcing companies to base much of both strategy and daily operations on inferences about these things and their behavior from limited data (data which was also often gathered only after the fact).

Now that lack of information is being removed. The Internet of Things creates two unprecedented opportunities regarding data about things:

  • data will be available instantly, as it is generated by the things
  • it can also be shared instantly by everyone who needs it.

This real-time knowledge of things presents both real opportunities and significant management challenges.

Each opportunity carries with it the challenge of crafting new policies on how to manage access to the vast new amounts of data and the forms in which it can be accessed.

For example: with the Internet of Things we will be able to bring about optimal manufacturing efficiency as well as unprecedented integration of supply chains and distribution networks. Why? Because we will now be able to “see” inside assembly line machinery, and the various parts of the assembly line will be able to automatically regulate each other without human intervention (M2M) to optimize each other’s efficiency, and/or workers will be able to fine-tune their operation based on this data.

Equally important, because of the second new opportunity, the exact same assembly line data can also be shared in real time with supply chain and distribution network partners. Each of them can use the data to trigger their own processes to optimize their efficiency and integration with the factory and its production schedule.

But that possibility also creates a challenge for management.

When data was hard to get, limited in scope, and largely gathered historically rather than in the moment, what data was available flowed in a linear, top-down fashion. Senior management had first access, then they passed on to individual departments only what they decided was relevant. Departments had no chance to simultaneously examine the raw data and have round-table discussions of its significance and improve decision-making. Everything was sequential. Relevant real-time data that they could use to do their jobs better almost never reached workers on the factory floor.

That all potentially changes with the IoT – but will it, or will the old tight control of data remain?

Managers must learn to ask a new question that’s so contrary to old top-down control of information: who else can use this data?

To answer that question they will have to consider the concept of a “data lake” created by the IoT.

“In broad terms, data lakes are marketed as enterprise wide data management platforms for analyzing disparate sources of data in its native format,” Nick Heudecker, research director at Gartner, says. “The idea is simple: instead of placing data in a purpose-built data store, you move it into a data lake in its original format. This eliminates the upfront costs of data ingestion, like transformation. Once data is placed into the lake, it’s available for analysis by everyone in the organization.”

Essentially, data that has been collected and stored in a data lake repository remains in the state it was gathered and is available to anyone, versus being structured, tagged with metadata, and having limited access.

That is a critical distinction and can make the data far more valuable, because the volume and variety will allow more cross-fertilization and serendipitous discovery.

At the same time, it’s also possible to “drown” in so much data, so C-level management must create new, deft policies – to serve as lifeguards, as it were. They must govern data lake access if we are to, on one hand, avoid drowning due to the sheer volume of data, and, on the other, to capitalize on its full value:

  • Senior management must resist the temptation to analyze the data first and then pass on only what they deem of value. They too will have a crack at the analysis, but the value of real-time data is getting it when it can still be acted on in the moment, rather than just in historical analyses (BTW, that’s not to say historical perspective won’t have value going forward: it will still provide valuable perspective).
  • There will need to be limits to data access, but they must be commonsense ones. For example, production line workers won’t need access to marketing data, just real-time data from the factory floor.
  • Perhaps most important, access shouldn’t be limited based on pre-conceptions of what might be relevant to a given function or department. For example, a prototype vending machine uses Near Field Communication to learn customers’ preferences over time, then offers them special deals based on those choices. However, by thinking inclusively about data from the machine, rather than just limiting access to the marketing department, the company shared the real-time information with its distribution network, so trucks were automatically rerouted to resupply machines that were running low due to factors such as summer heat.
  • Similarly, they will have to relax arbitrary boundaries between departments to encourage mutually-beneficial collaboration. When multiple departments not only share but also get to discuss the same data set, undoubtedly synergies will emerge among them (such as the vending machine ones) that no one department could have discovered on its own.
  • They will need to challenge their analytics software suppliers to create new software and dashboards specifically designed to make such a wide range of data easily digested and actionable.

Make no mistake about it: the simple creation of vast data lakes won’t automatically cure companies’ varied problems. But C-level managers who realize that if they are willing to give up control over data flow, real-time sharing of real-time data can create possibilities that were impossible to visualize in the past, will make data lakes safe, navigable – and profitable.