IBM Fluid Query 1.7 is Here!

by Doug Dailey

IBM Fluid Query offers a wide range of capabilities to help your business adapt to a hybrid data architecture and more importantly it helps you bridge across “data silos” for deeper insights that leverage more data.   Fluid Query is a standard entitlement included with the Netezza Platform Software suite for PureData for Analytics (formerly Netezza). Fluid Query release 1.7 is now available, and you can learn more about its features below.

Why should you consider Fluid Query?

It offers many possible uses for solving business problems in your business. Here are a few ideas:
• Discover and explore “Day Zero” data landing in your Hadoop environment
• Query data from multiple cross-enterprise repositories to understand relationships
• Access structured data from common sources like Oracle, SQL Server, MySQL, and PostgreSQL
• Query historical data on Hadoop via Hive, BigInsights Big SQL or Impala
• Derive relationships between data residing on Hadoop, the cloud and on-premises
• Offload colder data from PureData System for Analytics to Hadoop to free capacity
• Drive business continuity through low fidelity disaster recovery solution on Hadoop
• Backup your database or a subset of data to Hadoop in an immutable format
• Incrementally feed analytics side-cars residing on Hadoop with dimensional data

By far, the most prominent use for Fluid Query for a data warehouse administrator is that of warehouse augmentation, capacity relief and replicating analytics side-cars for analysts and scientists.

New: Hadoop connector support for Hadoop file formats to increase flexibility

IBM Fluid Query 1.7 ushers in greater flexibility for Hadoop users with support for popular file formats typically used with HDFS.Fluid query 1.7 connector picture These include popular data storage formats like AVRO, Parquet, ORC and RC that are often used to manage bigdata in a Hadoop environment.

Choosing the best format and compression mode can result in drastic differences in performance and storage on disk. A file format that doesn’t support flexible schema evolution can result in a processing penalty when making simple changes to a table. Let’s just  say that if you live in the Hadoop domain, you know exactly what I am speaking of. For instance, if you want to use AVRO, do your tools have readers and writers that are compatible? If you are using IMPALA, do you know that it doesn’t support ORC, or that Hortonworks and Hive-Stinger don’t play well with Parquet? Double check your needs and tool sets before diving into these popular format types.

By providing support for these popular formats,  Fluid Query allows you to import, store, and access this data through local tools and utilities on HDFS. But here is where it gets interesting in Fluid Query 1.7: you can also query data in these formats through the Hadoop connector provided with IBM Fluid Query, without any change to your SQL!

New: Robust connector templates

In addition, Fluid Query 1.7 now makes available a more robust set of connector templates that are designed to help you jump start use of Fluid Query. You may recall we provided support for a generic connector in our prior release that allows you to configure and connect to any structured data store via JDBC. We are offering pre-defined templates with the 1.7 release so you can get up and running more quickly. In cases where there are differences in user data type mapping, we also provide mapping files to simplify access.  If you have your own favorite database, you can use our generic connector, along with any of the provided templates as a basis for building a new connector for your specific needs. There are templates for Oracle, Teradata, SQL Server, MySQL, PostgreSQL, Informix, and MapR for Hive.

Again, the primary focus for Fluid Query is to deliver open data access across your ecosystem. Whether the data resides on disk, in-memory, in the Cloud or on Hadoop, we strive to enable your business to be open for data. We recognize that you are up against significant challenges in meeting demands of the business and marketplace, with one of the top priorities around access and federation.

New: Data movement advances

Moving data is not the best choice. Businesses spend quite a bit of effort ingesting data, staging the data, scrubbing, prepping and scoring the data for consumption for business users. This is costly process. As we move closer and closer to virtualization, the goal is to move the smallest amount of data possible, while you access and query only the data you need. So not only is access paramount, but your knowledge of the data in your environment is crucial to efficiently using it.

Fluid Query does offer data movement capability through what we call Fast Data Movement. Focusing on the pipe between PDA and Hadoop, we offer a high speed transfer tool that allows you to transfer data between these two environments very efficiently and securely. You have control over the security, compression, format and where clause (DB, table, filtered data). A key benefit is our ability to transfer data in our proprietary binary format. This enables orders of magnitude performance over Sqoop, when you do have to move data.

Fluid Query 1.7 also offers some additional benefits:
• Kerberos support for our generic database connector
• Support for BigInsights Big SQL during import (automatically synchronizes Hive and Big SQL on import)
• Varchar and String mapping improvements
• Import of nz.fq.table parameter now supports a combination of multiple schemas and tables
• Improved date handling
• Improved validation for NPS and Hadoop environment (connectors and import/export)
• Support for BigInsights 4.1 and Cloudera 5.5.1
• A new Best Practices User Guide, plus two new Tutorials

You can download this from IBM’s Fix Central or the Netezza Developer’s Network for use with the Netezza Emulator through our non-warranted software.

Picture1

Take a test drive today!

About Doug,
Doug Daily
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

Advertisements

Build skills for 2016 and Beyond: Data Warehousing and Analytics Top 10 Resources

by Cindy Russell, IBM Data Warehouse Marketing

Skills are always an essential consideration in technical careers and it is important for data warehousing professionals to expand their knowledge to handle the proliferation of data types and volumes in 2016 and beyond.

These are my “top 10” resource picks that you may want to explore. I am choosing these because of their popularity and also because they represent new technologies you may face in 2016 as you modernize your data warehouse and extend it beyond its traditional realm to meet new analytics needs.

  1. Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics – I am recommending this report because it provides an overview of the trends, issues and marketplace leaders in data warehousing. It calls out the need for the Logical Data Warehouse, which is a key element of a modernization strategy. I believe the Logical Data Warehouse will be of increasing importance to your operations in the coming months. Read a summary of the report.
  2. Logical Data Warehouse – Due to the massive and rapid growth of data volumes and types, a single centralized data warehouse cannot meet all of the new needs for analytics by itself. The data warehouse now becomes part of a Logical Data Warehouse in which a set of “fit for purpose” stores are used to house a range of data. This blog by Wendy Lucas was published in 2014, but is still a good primer on the concept if you need one.
  3. IBM Fluid Query information and entitlement for PureData clients – In 2015, we released a series of “agile” announcements of IBM Fluid Query. This is a tool that PureData System for Analytics clients can use to query more data sources for deeper insights. This tool is a key element when you have a Logical Data Warehouse where data stores include Hadoop, databases, other data warehouses and more. PureData clients can take advantage of this technology as part of the entitlements. Start learning with our blog series and webcast.
  4. dashDB, data warehousing on the cloud – dashDB was launched in 2014 as the IBM fully managed data warehouse in the cloud. Some initial use cases cloud be: setting up self-service data science sandboxes, establishing test environments or cost-effectively housing data that is already external, such as social media feeds. dashDB is based on the Netezza and BLU Acceleration in-memory computing technologies. If you have workloads you want to place on the cloud, dashDB is a good solution. This webcast and a TDWI Checklist for cloud get you started.
  5. Hadoop and Big SQL – Hadoop is a scalable, cost-effective, open source file system that can store a range of structured or unstructured data as part of a Logical Data Warehouse. It can also be used to help you manage capacity on the data warehouse, for example as a queryable historical archive. Read this blog by our expert to learn the basics. IBM provides a free open source distribution, IBM Open Platform with Apache Hadoop. For those looking to augment the IBM Open Platform, IBM BigInsights adds enterprise-grade features including visualization, exploration and advanced analytics. Within the family is an implementation that includes Big SQL—enabling you to use familiar SQL skills to query data in Hadoop. Explore the above content options, then get started with a no charge trial.
  6. Apache Spark –IBM announced a major commitment to Apache Spark in June, 2015 and has already made available a series of Spark-based products and cloud services. You will be seeing more of Spark across the IBM Analytics portfolio, so it is a good technology to learn. Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. It performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for speed, and supports the Java, Scala, and Python APIs for ease of development. I recommend this no charge Big Data University course on Spark fundamentals.
  7. Update to IBM Netezza Analytics software – For those of you who are PureData System for Analytics clients, there is an update to the Netezza Analytics software. Doug Daily is one of our experts in this area, and he created an announcement blog to help you understand what new capabilities you can leverage.
  8. Virtual Enzee on demand webcasts – IBM offers webcasts on topics related to data warehousing and PureData System for Analytics. Browse the “Virtual Enzee” webcast library to stay up to date on PureData through these on demand webcasts.
  9. Learn Cognos Analytics for user self-service applications – Some of our clients use Cognos BI in conjunction with their data warehouses for super-fast reporting. Cognos Analytics was announced at IBM Insight as a guided, self-service capability that provides a personal approach to analytics. As your users are demanding more insights, self-service may be a sound solution to some of their needs. Browse the blog and web site to learn more.
  10. IBMGo on demand keynotes from IBM Insight – If you were unable to attend IBM Insight 2015, IBMGo brings some of the main sessions to you! It is a great way to learn about the bigger IBM Analytics solutions and points of view. Start here.

Tweet this blog

What’s new: Netezza Platform Software and INZA software for PureData Systems for Analytics

by Doug Dailey

The IBM PureData Systems for Analytics team has just released a new set of enhancements over current software versions of Netezza Platform Software (NPS), INZA and IBM Fluid Query. These include enhanced  integration, security, real-time analytics for z Systems and usability features, all included in our latest software suite that has been posted on Fix Central.

There will be something here for everyone, whether you are looking to increase security, gain more leverage with DB2 Analytics Accelerator for z/OS*, improve your day-to-day experience or integrate PureData System (Netezza technology) into a Logical Data Warehouse. This post covers the new capabilities and enhancements in NPS 7.2.1 and INZA 3.2.1 software.  Refer to my IBM Fluid Query 1.6 post  for more information.

Strengthening end-to-end security for PureData and DB2 Analytics Accelerator for z/OS

With the advent of self-encrypted disk drives in our N3001 model, we laid the groundwork for securing data at rest. Not only do you have state of the art disk encryption keys by Seagate and Hitachi at work from a hardware standpoint, but you also have added peace of mind through a second tier of security that protects host drives and those drives associated with the Snippet Processing Unit. A local keystore with flexible CLI on the N3001 system enabled you to protect your most valuable assets. This release adds support for KMIP, which now allows 3rd party and IBM targeted key management software to backup, store and manage host and SPU keys on your system. Additional attention was paid to hardening the host systems for the DB2 Analytics Accelerator powered by PureData.
security

Speaking of DB2 Analytics Accelerator, this release of NPS provides key functionality recently added to DB2 Analytics Accelerator in version 5.1 which incorporates Netezza Analytics as a core component to help accelerate the use of predictive analytics applications (e.g., SPSS) such as data mining and in-database modeling. By extending support for the mainframe EBCDIC code to INZA software with support for new sets of procedures, you can run real-time analytics on DB2 Analytics Accelerator and establish work areas for data scientists. In-database transformation supports IBM DataStage balanced optimization and ETL/ELT consolidation processing.

This optimized, integrated appliance has been hardened to not only support self-encrypting drives available through PureData Systems for Analytics N3001 systems, but it now accounts for encryption of data-in-motion by encrypting network with the mainframe, FIPS-enabled RHEL, LFTP and secure VPN. Updated performance around continuous load operations better supports enterprise clients running highly concurrent trickle-feed loads under heavy processing of simultaneous mixed workloads to ensure faster data synchronization and TTV for insights. EBCDIC support for Netezza Analytics provides the ability to execute sophisticated in-database algorithms on DB2 Analytics Accelerator that allow micro-analytics across transactional, historical and real-time data.  NPS software now supports the following algorithms: Decision Tree, Regression Tree, Naïve Bayes, K-means Clustering and Two-Step Clustering.

PureData IDAA images

Making life easier through an improved User Experience

If these aren’t enough, we also targeted some areas to improve overall user experience by providing tooling and support that will make life easier for DBAs, system administrators and application developers:

  • Improved throughput and consistency for trickle-feed and highly concurrent smaller load operations.
  • nzload enhancements reduce TTV and shorten ETL activities; recordDelimiter, newline, timestamp, merge, datedelim, timedelim, and monitor.
  • New merge capability improves RI and positions Oracle migrations to PureData System.
  • nzSQL for Windows greatly improves usability for managing PureData System from the Windows desktop environment.
  • nzSQL support for external remote tables allows users to run load/unload operations from Linux clients to/from a remote file rather than host-only loads.
  • PureData will natively support Microsoft .NET and open a new range of possibilities for partner solutions.
  • JDBC support for JDK 1.7. in both NPS and INZA software ensures support for latest Hadoop distributions and also for Fluid Query.
  • New 64-bit BNR connectors are now certified for the latest versions of Tivoli, Netbackup and EMC.
  • PureData improves uptime by reducing requirements to stop and start NPS when user connections are exceeded.
  • ODBC support is now available for comments through DSN, odbc.ini and connection string (single, multi, inline, nested comments), as well as support for the LIMIT clause.

SQL enhancements

We’ve incorporated support for newer Client Kit OS versions and platforms with this release. Support for Windows 8, Windows 2012 R2, Ubuntu, and a completely new Power PC RHEL client for Little Endian. Support for Power on Little Endian positions PureData Systems for IBM BigInsights and the IBM Open Platform. We have also included additional SQL support for:

  • Support for DROP TABLE IF EXISTS
  • CREATE TABLE IF NOT EXISTS
  • Single slice support for JOINS with multi-column distribution keys
  • SQL push-down of NULL aware
  • New table-based Zone Maps

Client download of these new releases

NPS 7.2.1 and INXA 3.2.1 software is available at no charge to existing PureData clients. It can be easily downloaded from IBM Support Fix Central. Note that business partners and prospective clients can download and explore these new releases on Netezza Developer Network (additional information below).

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to business partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For additional information on Fluid Query 1.6, refer to my what’s new post.

* DB2 Analytics Accelerator for z/OS is a high-performance appliance that integrates the IBM z Systems infrastructure with IBM PureData™ for Analytics, powered IBM Netezza technology. The solution transforms your mainframe into a highly-efficient transactional and analytics processing environment. This enables clients to exploit z Systems data where it originates.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

What’s new: IBM Fluid Query 1.6

by Doug Dailey

Editorial Note: IBM Fluid Query 1.7 became available in May, 2016. You can read about features in release 1.6 here, but we also recommend reading the release 1.7 blog here.

The IBM PureData Systems for Analytics team has assembled a value-add set of enhancements over current software versions of Netezza Platform Software (NPS), INZA software and Fluid Query. We have enhanced  integration, security, real-time analytics for System z and usability features with our latest software suite arriving on Fix Central today.

There will be something here for everyone, whether you are looking to integrate your PureData System (Netezza) into a Logical Data Warehouse, improve security, gain more leverage with DB2 Analytics Accelerator for z/OS, or simply improve your day-to-day experience. This post covers the IBM Fluid Query 1.6 technology.  Refer to my NPS and INZA post (link) for more information on the enhancements that are now available in these other areas.

Integrating with the Logical Data Warehouse: Fluid Query overview

Are you struggling with building out your data reservoir, lake or lagoon? Feeling stuck in a swamp? Or, are you surfing effortlessly through an organized Logical Data Warehouse (LDW)?

Fluid Query offers a nice baseline of capability to get your PureData footprint plugged into your broader data environment or tethered directly to your IBM BigInsights Apache Hadoop distribution. Opening access across your broader ecosystem of on-premise, cloud, commodity hardware and Hadoop platforms gets you ever closer to capturing value throughout “systems of engagement” and “systems of record” so you can reveal new insights across the enterprise.

Now is the time to be fluid in your business, whether it is ease of data integration, access to key data for discovery/exploration, monetizing data, or sizing fit-for-purpose stores for different data types.  IBM Fluid Query opens these conversations and offers some valuable flexibility to connect the PureData System with other PureData Systems, Hadoop, DB2, Oracle and virtually any structured data source that supports JDBC drivers.

The value of content and the ability to tap into new insights is a must have to compete in any market. Fluid Query allows you to provision data for better use by application developers, data scientists and business users. We provide the tools to build the capability to enable any user group.

fluid query connectors

What’s new in Fluid Query 1.6?

Fluid Query was released this year and is in its third “agile” release of the year. As part of NPS software, it is available at no charge to existing PureData clients, and you will find information on how to access Fluid Query 1.6 below.

This capability enables you to query more data for deeper analytics from PureData. For example, you can query data in the PureData System together with:

  • Data in IBM BigInsights or other Hadoop implementations
  • Relational data stores (DB2, 3rd party and open source databases like Postgres, MySQL, etc.)
  • Multi-generational PureData Systems for Analytics systems (“Twin Fin”, “Striper”, “Mako”)

The following is a summary of some new features in the release that all help to support your needs for insights across a range of data types and stores:

  • Generic connector for access to structured data stores that support JDBC
    This generic connector enables you to select the database of choice. Database servers and engines like Teradata, SQL Server, Informix, MemSQL and MAPR can now be tapped for insight. We’ve also provided a capability to handle any data type mismatches between differing source/target systems.
  • Support for compressed read from Big SQL on IBM BigInsights
    Now using the Big SQL capability in IBM BigInsights, you are able to read compressed data in Hadoop file systems such as Big Insights, Cloudera and Hortonworks. This adds increased flexibility and efficiency in storage, data protection and access.
  • Ability to import databases to Hadoop and append to tables in Hadoop
    New capabilities now enable you to import databases to Hadoop, as well as append data in existing tables in Hadoop. One use case for this is backing up historical data to a queryable archive to help manage capacity on the data warehouse. This may include incremental backups, for example from a specific date for speed and efficiency.
  • Support for the lastest Hadoop distributions
    Fluid Query v. 1.6 now supports the latest Hadoop distributions, including BigInsights 4.1, Hortonworks 2.5 and Cloudera 5.4.5. For Netezza software, support is now available for NPS 7.2.1 and INZA 3.2.1.

Fluid Query 1.6 can be easily downloaded from IBM Support Fix Central. I encourage you to refer to my “Getting Started” post that was written for Fluid Query 1.5 for additional tips and instructions. Note that this link is for existing PureData clients. Refer to the section below if you are not a current client.

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to helping our partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For more information,  NPS 7.2.1 and INZA 3.2.1 software, refer to my post.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

Blaze your own trail, with deeper insights from more data!

by Doug Dailey

Much Like Thomas Jefferson’s commission to Lewis & Clark for their expeditions, Data Scientists are commissioned by the business to understand the landscape of information across their enterprise, map disparate data sources and identify valuable assets.

By leveraging an assortment of technologies and innovative thinking, Data Scientists utilize every tool possible to glean transformative insights. Now that we can offer a higher level of data access with the new IBM Fluid Query capability, Data Scientists and Business Analysts can partner up (just like Lewis and Clark) to explore data from across the enterprise and discover  business insights from “Systems of Record” and “Systems of Engagement”.

Fluid Query software offers a mechanism to integrate various sources of data for discovery, exploration, reporting and analytics. Day-to-day work is about building common pathways to extract value. Certain use cases call for positioning small sub-sets of data in a simple and efficient manner.

Fluid Query software offers a mechanism to integrate various sources of data for discovery, exploration, reporting and analytics.

Our primary goal with Fluid Query is to deliver seamless access to data regardless of its location, and hand off as much query processing as possible to the system where the data actually sits. Once discovery and exploration result in new golden nuggets of insight, a possible next step is to aggregate and export that data to IBM PureData System for Analytics for deeper insights. This is not a replacement for replication or ETL technologies, but a straight-forward approach to positioning data in the right location, at the right moment, for the most appropriate level of processing.

fluid query platforms
IBM Fluid Query enables deeper insights from more data that is stored on these platforms.

IBM Fluid Query 1.5 delivers opens the field of vision for PureData System for Analytics to Hadoop, Spark and other structured data stores like DB2 for Linux, UNIX and Windows, PureData for Operational Analytics, dashDB (IBM Data warehouse as as Service) and Oracle. In addition it opens “anywhere to anywhere” multi-generational access across currently supported PureData systems. This includes PureData System for Analytics N100x, N2001, N2002 and N3001 systems, our rack mounted N3001-001 system, and our IBM Netezza Platform Development Software running on reference architecture.

IBM Fluid Query is a newly added software component to our Netezza Performance Server Software suite. PureData clients can easily download the component from IBM Fix Central or our Netezza Developer Network. Did I mention that Fluid Query is offered at no charge? No added cost with loads of added value for PureData clients. Can Oracle or Teradata offer this level of out-of-the-box value?

Did I mention that Fluid Query is offered at no charge? No added cost with loads of added value for PureData clients.

Now, let’s get down to brass tacks. Silos of data are bad and openness with varying degrees of access will allow you to easily connect disparate systems. Enterprise warehouses have a strong and well deserved tradition for housing tons of high value data that has been passed through iterations of cleansing, transformation and governance.

IBM’s PureData System for Analytics systems delivers best-in-class in-database analytics, alongside our state-of-the-art AMPP architecture and impressive compute framework to deliver insights at the speed of thought. These insights generate new questions and answers that shape the direction for the business, or result in improved client experience. In turn, this adds new customers, new revenue sources and other forms of opportunity.

Fluid Query helps with response to demands for fluid access to big data and for integrating PureData System for Analytics into the Logical Data Warehouse. We’ve currently delivered against the leading Hadoop distributions, and have recently added connectors that support leading relational databases, such as DB2 and dashDB.

IBM Fluid Query also includes a fast data movement capability that allows you establish queryable archives on target Hadoop distributions, create DR subsets of data for safe keeping, or move seldom-accessed or colder data to Hadoop for secondary processing. Thus, Fluid Query helps offload workloads and open PureData System for Analytics for HOT data with immediate value for the business. You’re in control of what, when, how, and where! Moving data in plain text is a common practice, but why not move the data in proprietary Netezza binary format? With up to 3X+ compression and 3X the speed of standard out-of-the-box Sqoop, it can be stored and queried in either compressed format or plain text after being decompressed. IBM Fluid Query also allows you to control authentication via LDAP, kerberos, SSL, SSL + Secure, local or no SASL.

… Fluid Query helps offload workloads and open PureData System for Analytics for HOT data with immediate value for the business.

Why not leverage IBM Fluid Query to prime Test or Development environments running on dedicated rack configurations, lightweight N3001-001 systems or our software-only edition with sample data? Or, test different data models, apply analytic algorithms or stage beta testing and certification for upcoming production applications.

You get the picture. IBM Fluid Query introduces a new brand of flexibility for your PureData System for Analytics environment that lines up nicely with IBM Fluid Query capabilities delivered in IBM Big SQL through IBM BigInsights, DB2 and PureData for Operational Analytics. After all, this is not just a one-way street, because these other IBM products play the same game.

It’s time to integrate, explore new business opportunities and simply improve your day-to-day operations!

IBM Fluid Query v 1.5 is now generally available as a software addition to PureData System for Analytics clients. To learn more about Fluid Query v 1.5, I invite you to listen to this webinar: IBM Fluid Query – Unifying Data Access Across the Logical Data Warehouse.

About Doug,
Doug Daily
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

How to get the most out of your PureData System for Analytics using Hadoop as a cost-efficient extension

By Ralf Goetz

Today’s requirements for collecting huge amounts of data are different from several years back when only relational databases satisfied the need for a system of record.

Now, new data formats need to be acquired, stored and processed in a convenient and flexible way. Customers need to integrate different systems and platforms to unify data access and acquisition without losing control and security.

The logical data warehouse

More and more relational databases and Hadoop platforms are building the core of a Logical Data Warehouse in which each system handles the workload which it can handle best. We call this using “fit for purpose” stores.

An analytical data warehouse appliance such as PureData System for Analytics is often at the core of this Logical Data Warehouse and it is efficient in many ways. It can host and process several terabytes of valuable, high-quality data enabling lightning fast analytics at scale. And it has been possible (with some effort) to move bulk data between Hadoop and relational databases using Sqoop – an open source component of Hadoop. But there was no way to query both systems using SQL – a huge disadvantage.

Two options for combining relational database and Hadoop

Why move bulk data between different systems or run cross-systems analytical queries? Well, there are several use cases for this scenario but I will only highlight two of them based on a typical business scenario in analytics.

The task: an analyst needs to find out how the stock level of the company’s products will develop throughout the year. This stock level is being updated very frequently and produces lots of data in the current data warehouse system implemented on PureData System for Analytics. Therefore the data cannot be kept in the system for more than a year (hot data). A report on this hot data indicates that the stock level is much too high and needs to be adjusted to keep stock costs low. This would normally trigger immediate sales activities (e.g. a marketing and/or sales campaign with lower prices).

“We need a report, which could analyze all stock levels for all products for the last 10+ years!”

Yet, a historical report, which could analyze all stock levels for all products for the last 10+ years would have indicated that the stock level at this time of the year is a good thing, because a high season is approaching. Therefore, the company would be able to sell most of their products and satisfy the market trend. But how can the company provide such a report with so much data?

 

The company would have 2 use case options to satisfy their needs:

  1. Replace the existing analytical data warehouse appliance with a newer and bigger one (This would cost some dollars and has been covered in another blog post.), or
  2. Use an existing Hadoop cluster as a cheap storage and processing extension for the data warehouse appliance (Note that a new, yet to be implemented Hadoop cluster would probably cost more than a bigger PureData box as measured by Total Cost of Ownership).

Option 2 would require a mature, flexible integration interface between Hadoop and PureData. Sqoop would not be able to handle this, because it requires more capabilities than just bulk data movement capabilities from Hadoop to PureData.

IBM Fluid Query for seamless cross-platform data access using standard SQL

These requirements are only two of the reasons why IBM has introduced IBM Fluid Query in March, 2015 as a no charge extension for PureData System for Analytics. Fluid Query enables bulk data movement from Hadoop to PureData and vice versa AND operational SQL query federation. With Fluid Query, data residing in Hadoop distributions from Cloudera, Hortonworks and IBM BigInsights for Apache Hadoop can be combined with the data residing in PureData using standard SQL syntax.

“Move and query all data, find the value in the data and integrate only if needed.”

This enables users to seamlessly query older, cooler data and hot data without the complexity of data integration with a more exploratory approach: move and query all data, find the value in the data and integrate only if needed.

IFQ_Goetz_graphic 2_566 x 243

IBM Fluid Query can be downloaded and installed as a free add-on for PureData System for Analytics.

Try it out today. IBM Fluid Query is technology that is available for PureData System for Analytics.  Clients can download and install this software and get started right away with these new capabilities.  Download it here on Fix Central. Doug Dailey’s “Getting Started with Fluid Query” blog for more information and documentation links to get started is highly recommended reading.  Update: Learn about Fluid Query 1.5, announced July, 2015.

IBM Fluid Query Minimum System Requirements

About Ralf,
Ralf GoetzRalf is an Expert Level Certified IT Specialist in the IBM Software Group. Ralf joined IBM trough the Netezza acquisition in early 2011. For several years, he led the Informatica tech-sales team in DACH region and the Mahindra Satyam BI competency team in Germany. He then became part of the technical pre-sales representative for Netezza and later for the PureData System for Analytics. Ralf is still focusing on PDA but is also supporting the technical sales of all IBM BigData products. Ralf holds a Master degree in computer science.

Do you want to learn more about Big Data and modern data warehousing?

Getting Started with IBM Fluid Query 1.0 for IBM PureData System for Analytics

By Doug Dailey

As Big Data concepts continue to mature and evolve, so does the technology that encourages its adoption. Enterprises are looking at ways to better leverage their data by reducing costs and positioning data for success based on its relevance. The yield for this exercise delivers optimum insights for the business at the right time.

Many are finding that Hadoop is not the answer for all of their data needs. They want to have access to various systems, rather than choosing a “one size fits all” mentality. Enterprise Data Warehouse (EDW), relational, content stores, real-time in-memory processing and more all have their place. We have seen an increasing number of software tools, specialized hardware products and services that work to bridge the gap between approaches to store or analyze the data.

Fluid Query Strengths – Query access and Data Movement with Hadoop

IBM introduced Fluid Query 1.0 for use on PureData System for Analytics in March. The capability allows PureData users to turn their EDW on its end and work as a client. Traditionally, EDW environments served as landing zones for high value data to explore, analyze and gain speed of thought insights from complex in-database algorithms. Now, IBM Fluid Query allows PureData users to access data residing on Hadoop distributions as if they are a client. This does not move and store data locally, but actually pushes SQL down to Hadoop offload processing via Map Reduce jobs. Now, you can query directly from Hadoop and move data natively between PureData and Hadoop in parallel.

Are you interested in doing any of the following?

● Query Hadoop data from your PureData System for Analytics
● Bi-directional data transfer between PureData and Hadoop (BigInsights, Hortonworks or Cloudera)
● Move data between PureData and Hadoop in parallel
● Full control over tables and data ranges queried or transferred
● Automatic registration with Hive meta-store

How to Get Started

Customers have been able to download, install, configure and test Fluid Query in less than 30 minutes. This is a perfect lunch hour activity for inquiring minds. Just be sure that your Hadoop and PureData environment have the needed prerequisites in place. This will run on PureData System for Analytics N100x, N2001, N2002, and N3001.

IBM Fluid Query Minimum System Requirements

 

 

 

 

 

 

 

Tools needed for installation:

(1) Supported Hadoop distribution installed, up & running

supported hadoop providers

(2) Active network connection and user access/authentication between PureData and Hadoop

(3) PureData installed with Netezza Analytics

(4) Data available for use

Downloading and installing Fluid Query:

1. Download FLUIDQUERY_1.0 tar package from Fix Central
http://www-933.ibm.com/support/fixcentral/

Download IBM Fluid Query

 

 

 

 

 

 

 

 

 

 

2. The IBM Fluid Query User Guide can be found here for more details on setup and configuration.

3. Unpack the FluidQuery_1.0 bundle and run the fluidquery_install.pl script.

4. Configure Fluid Query for use, then query and move data to your heart’s content. This is comprised of a lightweight configuration, registration of user defined table functions, and view creation.

Finally, use your favorite tool to execute your Hadoop query and view results.

IBM Fluid Query screen shot 2

 

In keeping with the simplicity and ease of use of Netezza technology, we have delivered a very lightweight set of capabilities that pack a load of value for your Logical Data Warehouse ecosystem. Whether you are trudging through a data swamp, or swimming in a data lake or reservoir, you can very easily reel in results important to your business.

Go to the IBM Fluid Query Solution Brief to learn more.

Update: Learn about Fluid Query 1.5 announced in July, 2015.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.