IBM Fluid Query 1.7 is Here!

by Doug Dailey

IBM Fluid Query offers a wide range of capabilities to help your business adapt to a hybrid data architecture and more importantly it helps you bridge across “data silos” for deeper insights that leverage more data.   Fluid Query is a standard entitlement included with the Netezza Platform Software suite for PureData for Analytics (formerly Netezza). Fluid Query release 1.7 is now available, and you can learn more about its features below.

Why should you consider Fluid Query?

It offers many possible uses for solving business problems in your business. Here are a few ideas:
• Discover and explore “Day Zero” data landing in your Hadoop environment
• Query data from multiple cross-enterprise repositories to understand relationships
• Access structured data from common sources like Oracle, SQL Server, MySQL, and PostgreSQL
• Query historical data on Hadoop via Hive, BigInsights Big SQL or Impala
• Derive relationships between data residing on Hadoop, the cloud and on-premises
• Offload colder data from PureData System for Analytics to Hadoop to free capacity
• Drive business continuity through low fidelity disaster recovery solution on Hadoop
• Backup your database or a subset of data to Hadoop in an immutable format
• Incrementally feed analytics side-cars residing on Hadoop with dimensional data

By far, the most prominent use for Fluid Query for a data warehouse administrator is that of warehouse augmentation, capacity relief and replicating analytics side-cars for analysts and scientists.

New: Hadoop connector support for Hadoop file formats to increase flexibility

IBM Fluid Query 1.7 ushers in greater flexibility for Hadoop users with support for popular file formats typically used with HDFS.Fluid query 1.7 connector picture These include popular data storage formats like AVRO, Parquet, ORC and RC that are often used to manage bigdata in a Hadoop environment.

Choosing the best format and compression mode can result in drastic differences in performance and storage on disk. A file format that doesn’t support flexible schema evolution can result in a processing penalty when making simple changes to a table. Let’s just  say that if you live in the Hadoop domain, you know exactly what I am speaking of. For instance, if you want to use AVRO, do your tools have readers and writers that are compatible? If you are using IMPALA, do you know that it doesn’t support ORC, or that Hortonworks and Hive-Stinger don’t play well with Parquet? Double check your needs and tool sets before diving into these popular format types.

By providing support for these popular formats,  Fluid Query allows you to import, store, and access this data through local tools and utilities on HDFS. But here is where it gets interesting in Fluid Query 1.7: you can also query data in these formats through the Hadoop connector provided with IBM Fluid Query, without any change to your SQL!

New: Robust connector templates

In addition, Fluid Query 1.7 now makes available a more robust set of connector templates that are designed to help you jump start use of Fluid Query. You may recall we provided support for a generic connector in our prior release that allows you to configure and connect to any structured data store via JDBC. We are offering pre-defined templates with the 1.7 release so you can get up and running more quickly. In cases where there are differences in user data type mapping, we also provide mapping files to simplify access.  If you have your own favorite database, you can use our generic connector, along with any of the provided templates as a basis for building a new connector for your specific needs. There are templates for Oracle, Teradata, SQL Server, MySQL, PostgreSQL, Informix, and MapR for Hive.

Again, the primary focus for Fluid Query is to deliver open data access across your ecosystem. Whether the data resides on disk, in-memory, in the Cloud or on Hadoop, we strive to enable your business to be open for data. We recognize that you are up against significant challenges in meeting demands of the business and marketplace, with one of the top priorities around access and federation.

New: Data movement advances

Moving data is not the best choice. Businesses spend quite a bit of effort ingesting data, staging the data, scrubbing, prepping and scoring the data for consumption for business users. This is costly process. As we move closer and closer to virtualization, the goal is to move the smallest amount of data possible, while you access and query only the data you need. So not only is access paramount, but your knowledge of the data in your environment is crucial to efficiently using it.

Fluid Query does offer data movement capability through what we call Fast Data Movement. Focusing on the pipe between PDA and Hadoop, we offer a high speed transfer tool that allows you to transfer data between these two environments very efficiently and securely. You have control over the security, compression, format and where clause (DB, table, filtered data). A key benefit is our ability to transfer data in our proprietary binary format. This enables orders of magnitude performance over Sqoop, when you do have to move data.

Fluid Query 1.7 also offers some additional benefits:
• Kerberos support for our generic database connector
• Support for BigInsights Big SQL during import (automatically synchronizes Hive and Big SQL on import)
• Varchar and String mapping improvements
• Import of nz.fq.table parameter now supports a combination of multiple schemas and tables
• Improved date handling
• Improved validation for NPS and Hadoop environment (connectors and import/export)
• Support for BigInsights 4.1 and Cloudera 5.5.1
• A new Best Practices User Guide, plus two new Tutorials

You can download this from IBM’s Fix Central or the Netezza Developer’s Network for use with the Netezza Emulator through our non-warranted software.

Picture1

Take a test drive today!

About Doug,
Doug Daily
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

4 thoughts on “IBM Fluid Query 1.7 is Here!

  1. I have read Fluid Query user guide. I have setup data transfer in Hadoop to get data from Netezza into data lake via configuring import XML file. I have read data connector setup to setup function such as fqRead to read data from Hadoop into Netezza. However, there doesn’t seem to be any way to set up “fqWrite” function where I can push data from Netezza into Hadoop. Is there a way to push data from Netezza into Hadoop (without setting up import XML via hadoop) via Data Connector setup? This would go a long way to establish native method in Netezza for Netezza analyst/developer to start archiving older data into data lake via “fqWrite” and later setup views to combine archived and current data via “fqRead” to virtualize information access for users and shield users from complexity of Hadoop archived data.

    Like

  2. Hi Rich

    Fluid Query leverages APIs to perform specific functions related to sending SQL to target data stores; SQL, targetSchema and targetStringSize.

    IBM Fluid Query targets SQL push-down to target data stores to retrieve results from SQL and has not performed comprehensive tests around DML for inserts and updates, however, our functions support sending any SQL string supported on the target data source, similar to leveraging Hadoop-based Hive functions like the following:

    fqRead(‘db’,’’, ‘SELECT count(*) from hadoop_table;
    fqRead(‘db’,”, ‘SELECT a, COUNT(b) OVER (PARTITION BY c) FROM T’);
    fqRead(‘db’,”, ‘SELECT ROW_NUMBER() OVER (PARTITION BY C) as RN, B, C from T’);

    IBM Fluid Query does support CTAS and Insert into A select * from B for SQL-based read and local writing. You can create a Fluid Query function that passes writes or built-in functions that 3rd party RDBMS sources would support. This is one of the values for pushing down processing to a target site in order to return an optimized set of results. (count(*) and analytics functions like OVER are easily leveraged)

    Example query engine functions:

    •https://developer.ibm.com/hadoop/docs/biginsights-value-add/big-sql/user-defined-functionsudfs-big-sql-v3-0/
    •http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_analytic_functions.html
    •https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

    Like

  3. Nice article Doug. We recently started using FQ to interact with the Hadoop environment and it worked well.

    However when we used the generic ODBC connector to connect to SQL Server, it seemed to be very slow at importing rows (compared to other mechanisms). Is there a way to optimize this for instance by tuning how many rows are fetched in parallel?

    We also noticed that when trying to import a large number of rows between two Netezza boxes (say over 1 million rows) it always errors out with “ERROR: Error from JDBC driver: netezza.max.stmt.handles”.

    Any way to get around these?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s