Impede the Motion of Data and You Impede Innovation

June 17, 2020

The original article can be found here.

Impede the Motion of Data and You Impede Innovation by Ravi Naik

Why that is, and what to do about it.

Data drives innovation. At scale, innovation does not happen in isolation. Without finely tuned, smartly orchestrated data flows, innovation stalls.

A stubborn misconception regarding data casts it as mostly static. Picture streams of data arriving at data lakes only to lethargically drift to rest at the bottom — inactive and motionless. Consider the very phrase datalake: it implies a kind of placidity. In particular, when organizations treat data lakes as data dump sites, they become what’s been dubbed as data swamps.

Of course, placid data lakes are a reality in some instances when data does need to be merely stored, and not much else. Archival and backup data belongs to this category: for example, data backed up for business continuity reasons, of which organizations need multiple copies.

At the same time, we live in a world where increasingly enterprises want their data awake and in motion. In The Book of Why: The New Science of Cause and Effect, the Turing Award-winning computer scientist and philosopher Judea Pearl reassured, “You are smarter than your data. Data do not understand causes and effects; humans do.” It’s up to us, humans — and the processes we develop — to make sense of data. It’s up to us to put data to use.

Every business is a data business. But enterprise data is of little value if it is not used. To efficiently and smartly make sense of data, we need to see data lakes as reservoirs where many vibrant rivers meet; the task is to comingle various data currents. There is a need to share data with other lakes in order to cross-reference and run analytics on disparate streams of data together.

Take autonomous cars. To begin with, there’s value in analyzing data from one vehicle, and within one company. Cross analyzing that one vehicle’s data with vehicles from all autonomous car companies adds another layer of insight. For a richer picture, zoom out from there to integrating knowledge derived from that one vehicle’s data with data that proceeds from the billions of sensors that make up a smart city. The fuller picture may be useful to the regional government and city planners who implement better public safety standards and traffic flows.

The more pieces you put together, the bigger a puzzle you can solve. You can tackle a much higher order problem if you share data, cross-referencing various streams of information for analysis.

That’s why enabling the movement of data matters. Data needs to move in order to allow for interconnectedness of data — and the insights that result.

The data dams

But, as many businesses are finding out, putting large volumes of data into motion can be tricky.

First, egress charges stand in the way. It’s not easy to move data out from public cloud for analysis because of the fees that cloud service providers charge their customers. What would it take to take a petabyte out of the cloud? The egress charge is between 5 and 20 cents per GB every time customers move their data from the cloud to an on-premises location. This means that if an enterprise wants to take out a petabyte of data, it costs between $50,000 and $200,000.

Second, solutions that do solve the data transport problem—such as fiber-optic cable and existing data transport devices—are limited. They aren’t universally available, they may not be big enough, they aren’t flexible enough, or they face ingest problems. There is not enough fiber in the ground to accommodate the growing data needs. Shuttles can in many cases move large volumes of data fast. But today’s shuttle boxes come with restrictions on logical interfaces; some lack the ruggedness needed for transport. Because many shuttle systems are proprietary, their use cases can be limited.

These issues are all solvable, and business owners who like their data in motion focus on overcoming these barriers. This is all the more important in our multi-cloud world. If data is not moving — from edge to cloud, from public cloud to on-premises data centers, from cloud to cloud, etc. — it’s not enabling competitive business value.

Innovation, which is often enabled by specialized AI clouds, needs unobstructed flows of data. Winning enterprises know that when they free up the movement of data, they speed up innovation.

Why machine learning struggles with causality

March 28, 2021

Blog