By Doug Dailey
As Big Data concepts continue to mature and evolve, so does the technology that encourages its adoption. Enterprises are looking at ways to better leverage their data by reducing costs and positioning data for success based on its relevance. The yield for this exercise delivers optimum insights for the business at the right time.
Many are finding that Hadoop is not the answer for all of their data needs. They want to have access to various systems, rather than choosing a “one size fits all” mentality. Enterprise Data Warehouse (EDW), relational, content stores, real-time in-memory processing and more all have their place. We have seen an increasing number of software tools, specialized hardware products and services that work to bridge the gap between approaches to store or analyze the data.
Fluid Query Strengths – Query access and Data Movement with Hadoop
IBM introduced Fluid Query 1.0 for use on PureData System for Analytics in March. The capability allows PureData users to turn their EDW on its end and work as a client. Traditionally, EDW environments served as landing zones for high value data to explore, analyze and gain speed of thought insights from complex in-database algorithms. Now, IBM Fluid Query allows PureData users to access data residing on Hadoop distributions as if they are a client. This does not move and store data locally, but actually pushes SQL down to Hadoop offload processing via Map Reduce jobs. Now, you can query directly from Hadoop and move data natively between PureData and Hadoop in parallel.
Are you interested in doing any of the following?
● Query Hadoop data from your PureData System for Analytics
● Bi-directional data transfer between PureData and Hadoop (BigInsights, Hortonworks or Cloudera)
● Move data between PureData and Hadoop in parallel
● Full control over tables and data ranges queried or transferred
● Automatic registration with Hive meta-store
How to Get Started
Customers have been able to download, install, configure and test Fluid Query in less than 30 minutes. This is a perfect lunch hour activity for inquiring minds. Just be sure that your Hadoop and PureData environment have the needed prerequisites in place. This will run on PureData System for Analytics N100x, N2001, N2002, and N3001.
Tools needed for installation:
(1) Supported Hadoop distribution installed, up & running
(2) Active network connection and user access/authentication between PureData and Hadoop
(3) PureData installed with Netezza Analytics
(4) Data available for use
Downloading and installing Fluid Query:
1. Download FLUIDQUERY_1.0 tar package from Fix Central
2. The IBM Fluid Query User Guide can be found here for more details on setup and configuration.
3. Unpack the FluidQuery_1.0 bundle and run the fluidquery_install.pl script.
4. Configure Fluid Query for use, then query and move data to your heart’s content. This is comprised of a lightweight configuration, registration of user defined table functions, and view creation.
Finally, use your favorite tool to execute your Hadoop query and view results.
In keeping with the simplicity and ease of use of Netezza technology, we have delivered a very lightweight set of capabilities that pack a load of value for your Logical Data Warehouse ecosystem. Whether you are trudging through a data swamp, or swimming in a data lake or reservoir, you can very easily reel in results important to your business.
Go to the IBM Fluid Query Solution Brief to learn more.
Update: Learn about Fluid Query 1.5 announced in July, 2015.
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.