by James Kobielus
Data warehousing (DW) should flow smoothly as an enterprise decision-support asset. For this to happen, the back-end DW infrastructure should enable a seamless flow of data acquisition, transformation, loading, access, query, and analysis functions all the way from sources to the end users trying to make evidence-driven decisions.
For an enterprise DW to support fluid delivery of data-driven insights, the enabling infrastructure needs to be engineered with simplicity, scale, speed, interoperability, and usability in order to eliminate any obstacles to maximum value. In the drive to modernize their DWs and address emerging requirements, enterprises may risk adding complexity that inadvertently impacts the productivity of DW users, administrators, and other stakeholders.
In a growing number of enterprise DW modernization initiatives, Hadoop is starting to play important supplementary roles such as supporting data refinement on unstructured sources and providing a low-cost, scalable, and queryable data archive. As Hadoop platforms such as IBM InfoSphere BigInsights take their place within “logical” or “hybrid” DW architectures alongside DW platforms such as IBM PureData System for Analytics, the underlying complexities grow, but the simplicity and fluidity of the overall end-to-end infrastructure needn’t suffer.
The fluidity of the Logical Data Warehouse (LDW) depends on core interfaces, infrastructure, and tooling that span the entire architecture, no matter how complex the underlying hybrid assortment of relational, Hadoop, NoSQL, and other data platforms. Chief among these enablers of LDW fluidity is SQL, the data access, query, and manipulate lingua franca of databases everywhere. SQL now pervades the Hadoop market thanks to initiatives and interfaces such as IBM Big SQL.
In a growing number of enterprise DW modernization initiatives, Hadoop is starting to play important supplementary roles such as supporting data refinement on unstructured sources and providing a low-cost, scalable, and queryable data archive.
However, SQL-over-Hadoop standards alone can’t achieve the promise of LDWs that remain seamlessly fluid and interoperable no matter how complex they grow under the covers. For that dream to come to fruition, the SQL dialects of the relational, Hadoop, and other platforms that comprise the LDW need to be accessible through a “fluid query” abstraction layer. This would enable all BI, reporting, dashboarding, statistical modeling, and other applications that query any data provided by any underlying platform within the LDW to speak one simple SQL dialect that spans it all.
A fluid query layer that spans the entire LDW would eliminate several obstacles to user and administrator productivity. It would avoid the need for users to query two or more separate data platforms and then either manually combine the results or have someone in IT implement a “data munging” tool to do that in a more automated fashion. If the unified query interface is combined with a fluid ability to move data back and forth between relational and Hadoop platforms to ensure optimal utilization of available LDW capacity, queries and all the supporting back-end data movement and transformation processes can operate much faster and more efficiently.
That’s the power of DW fluidity: simplicity, speed, throughput, scalability, and cost-effectiveness. The recent launch of IBM Fluid Query demonstrates that this dream is now a reality. Users that have invested in IBM PureData System for Analytics and the leading Hadoop distributions can now enable fast, unified, efficient queries across their hybrid DW environments like never before. This new solution gives DW administrators new power to choose the underlying data platform, PureData or Hadoop, that is best suited for each type of query, data, and workload.
Users that have invested in IBM PureData System for Analytics and the leading Hadoop distributions can now enable fast, unified, efficient queries across their hybrid DW environments like never before.
IBM Fluid Query 1.0 is available starting March 27. It includes connectors for routing PureData queries to the supported Hadoop platforms, which are the most widely adopted distributions in the marketplace. The solution, which comes at no additional charge with Netezza Platform Software 7.0.2 and Netezza Analytics 2.5 and higher, includes simple-to-install loaders for PDA and the file systems of the supported Hadoop platforms. Features include the ability to:
- Query the supported Hadoop distributions’ data from PDA;
- Perform queries of unstructured data in Hadoop landing zones from PDA;
- Run multi-temperature queries and advanced analytics that use data from PDA and/or Hadoop;
- Use multithreaded parallel transfers to move data efficiently, either in compressed or uncompressed form, to and from PDA and Hadoop file systems;
- Retain properly vetted Hadoop file system data in PDA (a feature that was already available in PDA prior to IBM Fluid Query);
- Deploy Hadoop as an alternate platform for ETL and ELT in conjunction with PDA;
- Persist cold, archival, and exploratory data from PDA to Hadoop file systems;
- Use Hadoop file systems for backup, disaster recovery, and capacity relief of data stored in PDA;
- Use Hadoop platforms to better manage capacity, resource utilization, and workloads on PDA within the LDW;
- Use PDA for production quality analytics where SLA performance times matter, while simultaneously utilizing Hadoop for advanced analytics and exploration of multistructured data.
For fluid queries that flow in either direction—from Hadoop to PDA and from PDA to Hadoop—the feature is available only with BigSQL, which is only available with IBM BigInsights, of which versions 2.1 and higher are supported in IBM Fluid Query. Other supported Hadoop distributions include Cloudera (4.7 and higher), and Hortonworks (2.2 and higher).
James Kobielus is IBM Senior Program Director, Product Marketing, Big Data Analytics solutions. He is an industry veteran, a popular speaker and social media participant, and a thought leader in big data, Hadoop, enterprise data warehousing, advanced analytics, business intelligence, data management, and next best action technologies. Follow James on Twitter : @Follow @ahrefhttpstwittercomjameskobielustarget_blankjameskobielusa