The Path To Repeatable AI Success: Why Data Fabrics Are The Next Step

Evolved Media

Few disagree with the general consensus that AI will reshape the business world. But at this point nobody really knows how it is going to happen.

We’ve seen plenty of proof of concept victories like Google’s TensorFlow image classification or its AlphaGo project as well as many others. Perhaps more relevant are the victories that are far less celebrated, like the use case presented at the last O’Reilly AI conference where an exec from the Microsoft Finance department described a years-long journey to improve financial forecasting using AI.

Okay, so how does that apply to most businesses? Examples from tech leaders don’t show the way because these victories are not repeatable at most companies. The whole world is not going to suddenly have that level of data engineering, data science, and software development talent, let alone executives who will fund such projects until they work.

Two types of products will bring AI to most companies:

AI will penetrate most companies through embedded AI. Salesforce, Numerify, or Absolutdata are all bringing AI to bear inside applications focused on specific use cases. Salesforce’s Einstein umbrella is used as a way to group all of the ways that AI and advanced analytics are being productized. Numerify is a company focused on creating a product to integrate all available data sources that offers a comprehensive stack of dashboards and KPIs to allow CIOs to run IT in a data driven manner. Absolutdata is creating a product that brings AI and ML to sales and marketing analytics and forecasting, along with making tactical recommendations. It seems most of the new products in cybersecurity claim to be powered by AI.

In terms of my framework for Productized Analytics, these offerings range from the dinner in a box level, in which some assembly is required, to the artisanal brew, where the apps are configured to fit the needs of the business. In most cases, it is the users of the embedded AI products that create value meals, that is, solutions for a specific use case.

AI will be used through Automated Machine Learning. DataRobot is my favorite company in this space because it is focused on finding ways to expand AI and ML to as many use cases as possible. With DataRobot, you can point the system at your data and express the kind of predictions you want, and then models are created, tested, and ranked for you. The challenge with this type of product is to make sure you aren’t using the data science in inaccurate ways. A layer of data science governance is needed. But this is a much easier problem than creating a data engineering, data science, and software development team. The goal of Automated Machine Learning is to find a way to hit lots of singles, a program I argued for in this article: “Productized Analytics: Why 100 Singles Are Better Than A Grand Slam.”

So does this mean that AI will be a great leveler? Everyone will be able to keep up just by buying and implementing the latest products?

Hell, no.

In a world of embedded, productized AI, differentiation will come from creating powerful data supply chains to locate, transform, and move data to where it is needed. The power of AI and ML, whether productized or not, is based on the data being used. Companies that invest in the ability to find and make use of their data faster and for a lower cost in the AI and ML products they buy will make those products work better.

A well-run data supply chain fuels many more PoCs, which can be run much faster and cheaper. Running AI and ML products in production will be lower cost and more reliable as well.

But a well-run data supply chain isn’t a product – and probably never will be. We do need something that takes the place of the data warehouse as the center of the data universe. As I point out in the Early Adopter Research Mission “Saving Your Data Lake,” the first go round at implementing data lakes didn’t create the needed solution. But we have no choice. Something must replace the data lake, whether it is Data Lake 2.0, or something else.

I believe that companies will create data supply chains out of collections of products. Like many other areas of IT, successful companies will assemble product-based platforms in which several different types of products are customized and work together to solve a problem in just the way they require.

The key challenge then becomes picking the right components. At this point, the products that will solve this problem are only starting to emerge. I see them falling into two categories:

OLAP-style products will become the fully realized version of the data lake. These products will be similar to a data warehouse in that data from applications will be collected and cleaned and made useful. But I believe this layer will be far more operational than the data warehouses of the past. Not only analytics use cases but applications will use data from this layer and feed data back into it.

An operational layer will collect data from all sources and make it available for applications and mission critical AI and ML workloads. I believe we are further along with the productization of this layer as I pointed out in these stories: “MapR - Why Data Fabric Is Now Vital To The App Stack” and “C3 IoT, MapR And Unlocking AI On The Data Fabric.”

These stories explain in detail that applications need a more powerful data layer than a database to allow data integration, storage and retrieval from a variety of different repositories as well as processing of data by advanced analytics, AI, and ML algorithms. In addition, applications need support for replication across on-premise and cloud data centers, access by microservices, and a variety of other features such as a global namespace.

We need data fabrics because support for operational data management and integration is now too complex for a database. It is also more complex than what will be supported by the OLAP layer.

As I argue in the stories I referenced, I believe a data fabric will be the layer that emerges to support mission-critical applications of all sorts, especially those running AI and ML workloads. If you don’t do this work in a data fabric layer, then each application will have to do it inside the application, which will never be sustainable. (Developers, do I hear an Amen?)

So, if you are a company seeking an advantage in implementing AI, I believe that using a data fabric will accelerate your ability to adopt AI products.

As Anoop Dawar of MapR pointed out in his keynote at O’Reilly Strata this year, “90% of machine learning success is data logistics, not learning.” Dawar explained that the era of data abundance has created a crisis of data complexity. Any individual AI or ML workload may need data from all over the enterprise.

In my view, creating an OLAP-style data supply chain (to replace the data warehouse/data lake) is a project that will take several years. But creating a data fabric layer to support operational and analytic use of data can happen much faster. The products in this area are much farther along.

With that operational layer in place, you will be able to prepare data to meet the needs of the AI products you decide to evaluate and adopt. When you do adopt them, you won’t have to wonder how you will support the application in production – data fabrics are built to support mission-critical applications.

So, in the short run, it seems prudent to figure out how to make a data fabric work to support both normal application development and those apps that use AI and ML. With this layer in place, companies will be able to bring all the data needed to productized AI and that will become a sustainable source of competitive advantage.

Follow me on Twitter or LinkedIn. Check out my website.

More From Forbes

The Path To Repeatable AI Success: Why Data Fabrics Are The Next Step