The Evolution Of The Enterprise Data Ecosystem And Its Challenges

Data Driven

In his 2005 book, The World is Flat: A Brief History of the Twenty-First Century, Thomas Friedman talked about globalization and a level playing field. Globalization, fading geographical boundaries, and increasing economic dependencies between countries and companies have increased the influence of and dependencies on global information for enterprises worldwide. Global information, characterized by an ever-greater amount and ever-increasing scope of data, has created complex problems for enterprises that must process all this data at tremendous speeds. Data-driven enterprises looking to address these complex data problems must identify and implement new technology solutions.

The Scope And Scale Of Enterprise Data

Enterprises today depend on huge amounts of data that come from sources up and down their supply chains. The scope of data evolved from monolithic applications that held all the relevant data within themselves to big data warehouses, operational data stores and ETL (extract, transform and load) technologies that took data from different applications, transformed it and moved it to support more complex analytics, all within the boundaries of an enterprise. It's now further transformed into a global, interdependent information supply chain that mirrors the way producers, suppliers and consumers worldwide are connected.

 

In addition to the broadening scope of data, enterprises are challenged by its massive scale, including huge amounts of up-to-the-second data coming from operational technologies like sensors and devices, news feeds, and social media. Businesses are also managing their historical data and often connecting that data with their new data for “instant” use cases, such as:

 

  • Credit Card Fraud Prevention: Credit card companies must prevent fraud while giving consumers the transaction experience they expect by processing or blocking transactions in near real time.

  • Financial Portfolio Risk Analysis: Asset portfolio managers look to gain milliseconds of advantage in the pricing of the asset they're trading by analyzing the micro- and macro-economic situations, market conditions and hundreds of variables at the time of a trade.

  • Energy Grid Operations Management: Energy grid operators constantly monitor the grid through hundreds of thousands of sensors continuously tracking grid health metrics. Operators must be able to process hundreds of thousands of events every second and make real-time decisions, such as whether to change the grid load profile or initiate preventive maintenance actions.

Multidimensional Data Problems

The above use cases all require multiple data sources, data life cycle stages and data processing strategies. For example, to conduct fraud analysis and prevention, a credit card provider must leverage:

 

  • Streaming data to process every incoming transaction request.

  • Intelligent analysis to understand what additional data is needed by the AI/ML model.

  • Historical data to understand, for example, each consumer’s usage history.

  • Intelligent model execution to combine the streaming event data with the historical data and execute the AI/ML model to make a decision.

  • Low latency to ensure all the above steps are performed within reasonably expected timeframes—a few milliseconds.

     

The exact nature of the processing may change from one use case to another, but all of the above use cases require multidimensional processing and must be done at ultra-low latency. Otherwise, there's no value in a solution for such a use case.

 

In addition to the obvious challenges here—the different nature of data being processed (at rest and in motion), the complexity of processing and the need for speed—there are some not-so-obvious challenges. With data coming from everywhere, it moves across multiple silos before it gets to the analytics or decision engine, and every hop across silos adds latency. There's also latency due to data movement over the network between the storage and processing technologies.

 

Even though data hubs have all the data that an enterprise may need to process any given computational workload, the fact that a decision engine or a data processing platform has to execute a query against the data hub and then move the large result set over the network into its processing resource space introduces latency. I discussed this challenge in a previous article on data integration hub (DIH) architecture. So, although data hubs are a great start and a critical building block in dealing with multidimensional data, something else is needed to address today’s data challenges.

 

Multiple vendors are working hard to address these challenges by combining the following key capabilities:

 

  • Process data in motion and data at rest.

  • Virtually hold data from multiple sources and offer persistent data storage while maintaining an acceptable level of data integrity, security, durability and availability.

  • Execute advanced AI/ML or any complex computation on large datasets at high speeds.

     

Conclusion

The scope of relevant information for today’s data-driven enterprises extends beyond the boundaries of their firewalls. This has created complex problems, including the multidimensional nature of the data to be processed, the enormous scale of data, and the tremendous speed at which the data must be processed.

 

Enterprises must have visibility into the relevant global data, as well as the ability to execute complex workloads on their data at ultra-low latencies to extract any timely intelligence from it. This can be facilitated by a platform that “unifies” the processing of data by combining streaming data in motion and historical data at rest with the ability to execute complex analytical workloads at ultra-low latency. In my next article, I'll break down in more detail the relevant elements of such a "unified" data platform.