by Wendy Lucas
Having grown up in the world of data and analytics, I long for the days when our goal was to create a single version of the truth. Remember when data architecture diagrams showed source systems flowing through ETL, into a centralized data warehouse and then out to business intelligence applications? Wow, that was nice and simple, right – at least conceptually? As a consultant, I can still remember advising clients and helping them to pictorially represent this reference architecture. It was a pretty simple picture, but that was also a long time ago.
While IT organizations struggled with data integration, enterprise data models and producing the single source of the truth, the lines of business grew impatient and would build their own data marts (or data silos). We can think of this as the first signs of the requirement for user self-service. The goal behind building the consolidated, enterprise, single version of the truth never went away. Sure, we still want the ability to drive more accurate decision-making, deliver consistent reporting, meet regulatory requirements, etc. However, the ability to achieve this goal became very difficult as requirements for user self-service, increased agility, new data types, lower cost solutions, better business insight and faster time to value became more important.
Recognizing the Logical Data Warehouse
Enterprises have developed collections of data assets that each provide value for specific workloads and purposes. This includes data warehouses, data marts, operational data stores and Hadoop data stores to name a few. It is really this collection of data assets that now serves as the foundation for driving analytics, fulfilling the purpose of the data warehouse within the architecture. The Logical Data Warehouse or LDW is a term we use to describe the collection of data assets that make up the data warehouse environment, recognizing that the data warehouse is no longer just a single entity. Each data store within the Logical Data Warehouse can be built on a different platform, fit for the purpose of the workload and analytic requirements it serves.
Each data store within the Logical Data Warehouse can be built on a different platform, fit for the purpose of the workload and analytic requirements it serves.
But doesn’t this go against the single version of the truth? The LDW will still struggle to deliver on the goal behind the single version of the truth, if it doesn’t have information governance, common metadata and data integration practices in place. This is a key concept. If you’re interested in more on this topic, check out a recent webcast by some of my colleagues on the “Five Pitfalls to Avoid in Your Data Warehouse Modernization Project: Making Data Work for You.”
Unifying data across the Logical Data Warehouse
Logically grouping separate data stores into the LDW does not necessarily make our lives easier. Assuming you have followed good information governance practices, you still have data stores in different places, perhaps on different platforms. Haven’t you just made your application developers and users lives, who want self-service, infinitely more difficult? Users need the ability to leverage data across these various data stores without having to worry about the complexity of where to find it, or re-writing their applications. And let’s not forget about the needs of IT. DBAs struggle to manage capacity and performance on data warehouses while listening to Hadoop administrators brag about the seemingly endless, lower cost storage and ability to manage new data types that they can provide. What if we could have the best of all worlds? Provide seamless access to data across a variety of stores, formats, and platforms. Provide capability for IT to manage Hadoop and Data Warehouses along-side each other in a way that leverages the strengths of both.
Introducing IBM Fluid Query
IBM Fluid Query is the capability to unify data across the Logical Data Warehouse, providing the ability to seamlessly access data in it’s various forms and locations. No matter where a user connects within the logical data warehouse, users have access to all data through the same, standard API/SQL/Analytics access. IBM Fluid Query powers the Logical Data Warehouse, giving users the ability to combine numerous types of data from various sources in a fast and agile manner to drive analytics and deeper insight, without worrying about connecting to multiple data stores, using different syntaxes or API’s or changing their application.
In its first release, IBM Fluid Query 1.0 will provide users of the IBM PureData System for Analytics the capability to access Hadoop data from their data warehouse and move data between Hadoop and PureData if needed. High performance is about moving the query to the data, not the data to the query. This provides extreme value to PureData users who want the ability to merge data from their structured data warehouse with Hadoop for powerful analytic combinations, or more in-depth analysis. IBM Fluid Query 1.0 is part of a toolkit within Netezza Platform Software (NPS) on the appliance so it’s free for all PureData System for Analytics customers.
IBM Fluid Query 1.0 will provide users of the IBM PureData System for Analytics the capability to access Hadoop data from their data warehouse and move data between Hadoop and PureData
For Hadoop users, IBM also provides IBM Big SQL which delivers Fluid Query capability. Big SQL provides the ability to run queries on a variety of data stores, including PureData System for Analytics, DB2 and many others from your IBM BigInsights Hadoop environment. Big SQL has the ability to push the query to the data store and return the result to Hadoop without moving all the data across the network. Other Hadoop vendors provide the ability to write queries like this but they move all the data back to Hadoop before filtering, applying predicates, joining, etc. In the world of big data, can you really afford to move lots of data around to meet the queries that need it?
IBM Fluid Query 1.0 is generally available on March 27 as a software addition to PureData System for Analytics customers. If you are an existing customer and want to understand how to take advantage of IBM Fluid Query 1.0 or if you just would like more information, I encourage you to listen to this on-demand webcast: Virtual Enzee – The Logical Data Warehouse, Hadoop and PureData System for Analytics and check out the solution brief. Or if you are an existing PureData System for Analytics customer, download this software. Update: Learn about Fluid Query 1.5, announced July, 2015.
Wendy Lucas is a Program Director for IBM Data Warehouse Marketing. Wendy has over 20 years of experience in data warehousing and business intelligence solutions, including 12 years at IBM. She has helped clients in a variety of roles, including application development, management consulting, project management, technical sales management and marketing. Wendy holds a Bachelor of Science in Computer Science from Capital University and you can follow her on Twitter at @wlucas001