IBM Fluid Query 1.7 is Here!

by Doug Dailey

IBM Fluid Query offers a wide range of capabilities to help your business adapt to a hybrid data architecture and more importantly it helps you bridge across “data silos” for deeper insights that leverage more data.   Fluid Query is a standard entitlement included with the Netezza Platform Software suite for PureData for Analytics (formerly Netezza). Fluid Query release 1.7 is now available, and you can learn more about its features below.

Why should you consider Fluid Query?

It offers many possible uses for solving business problems in your business. Here are a few ideas:
• Discover and explore “Day Zero” data landing in your Hadoop environment
• Query data from multiple cross-enterprise repositories to understand relationships
• Access structured data from common sources like Oracle, SQL Server, MySQL, and PostgreSQL
• Query historical data on Hadoop via Hive, BigInsights Big SQL or Impala
• Derive relationships between data residing on Hadoop, the cloud and on-premises
• Offload colder data from PureData System for Analytics to Hadoop to free capacity
• Drive business continuity through low fidelity disaster recovery solution on Hadoop
• Backup your database or a subset of data to Hadoop in an immutable format
• Incrementally feed analytics side-cars residing on Hadoop with dimensional data

By far, the most prominent use for Fluid Query for a data warehouse administrator is that of warehouse augmentation, capacity relief and replicating analytics side-cars for analysts and scientists.

New: Hadoop connector support for Hadoop file formats to increase flexibility

IBM Fluid Query 1.7 ushers in greater flexibility for Hadoop users with support for popular file formats typically used with HDFS.Fluid query 1.7 connector picture These include popular data storage formats like AVRO, Parquet, ORC and RC that are often used to manage bigdata in a Hadoop environment.

Choosing the best format and compression mode can result in drastic differences in performance and storage on disk. A file format that doesn’t support flexible schema evolution can result in a processing penalty when making simple changes to a table. Let’s just  say that if you live in the Hadoop domain, you know exactly what I am speaking of. For instance, if you want to use AVRO, do your tools have readers and writers that are compatible? If you are using IMPALA, do you know that it doesn’t support ORC, or that Hortonworks and Hive-Stinger don’t play well with Parquet? Double check your needs and tool sets before diving into these popular format types.

By providing support for these popular formats,  Fluid Query allows you to import, store, and access this data through local tools and utilities on HDFS. But here is where it gets interesting in Fluid Query 1.7: you can also query data in these formats through the Hadoop connector provided with IBM Fluid Query, without any change to your SQL!

New: Robust connector templates

In addition, Fluid Query 1.7 now makes available a more robust set of connector templates that are designed to help you jump start use of Fluid Query. You may recall we provided support for a generic connector in our prior release that allows you to configure and connect to any structured data store via JDBC. We are offering pre-defined templates with the 1.7 release so you can get up and running more quickly. In cases where there are differences in user data type mapping, we also provide mapping files to simplify access.  If you have your own favorite database, you can use our generic connector, along with any of the provided templates as a basis for building a new connector for your specific needs. There are templates for Oracle, Teradata, SQL Server, MySQL, PostgreSQL, Informix, and MapR for Hive.

Again, the primary focus for Fluid Query is to deliver open data access across your ecosystem. Whether the data resides on disk, in-memory, in the Cloud or on Hadoop, we strive to enable your business to be open for data. We recognize that you are up against significant challenges in meeting demands of the business and marketplace, with one of the top priorities around access and federation.

New: Data movement advances

Moving data is not the best choice. Businesses spend quite a bit of effort ingesting data, staging the data, scrubbing, prepping and scoring the data for consumption for business users. This is costly process. As we move closer and closer to virtualization, the goal is to move the smallest amount of data possible, while you access and query only the data you need. So not only is access paramount, but your knowledge of the data in your environment is crucial to efficiently using it.

Fluid Query does offer data movement capability through what we call Fast Data Movement. Focusing on the pipe between PDA and Hadoop, we offer a high speed transfer tool that allows you to transfer data between these two environments very efficiently and securely. You have control over the security, compression, format and where clause (DB, table, filtered data). A key benefit is our ability to transfer data in our proprietary binary format. This enables orders of magnitude performance over Sqoop, when you do have to move data.

Fluid Query 1.7 also offers some additional benefits:
• Kerberos support for our generic database connector
• Support for BigInsights Big SQL during import (automatically synchronizes Hive and Big SQL on import)
• Varchar and String mapping improvements
• Import of nz.fq.table parameter now supports a combination of multiple schemas and tables
• Improved date handling
• Improved validation for NPS and Hadoop environment (connectors and import/export)
• Support for BigInsights 4.1 and Cloudera 5.5.1
• A new Best Practices User Guide, plus two new Tutorials

You can download this from IBM’s Fix Central or the Netezza Developer’s Network for use with the Netezza Emulator through our non-warranted software.

Picture1

Take a test drive today!

About Doug,
Doug Daily
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

Virtual Enzee webcast roundup for 2016

By Cindy Russell

The first Virtual Enzee webcast of 2016 is scheduled for January 29th!  I will be updating this blog during 2016 so you have a handy resource to find out what sessions are upcoming and also listen to the replays on demand.

  1. Unifying Data Access across the Logical Data Warehouse with IBM Fluid Query
    IBM Fluid Query helps bring your enterprise into focus and eliminate some of the traditional barriers that exist between Fluid query enzee for wordpressdisparate data in your enterprise. In this session, we’ll review some common user stories for using Fluid Query through the lens of PureData System for Analytics/Netezza and BigInsights. Register here: http://bit.ly/231tICu|
  2.  

  3. Tame Spatial Queries with Netezza In-Database Analytics
    Attend this Virtual Enzee to learn how Netezza supports spatial data types and queries, how it can shorten complex spatial analytic projects and how it integrates with and complements existing geospatial platforms and solutions.  Register: bit.ly/FebEnz
  4.  

  5. Accelerating Open-Source R with IBM PureData System for Analytics (Netezza), January 29, 2016 at 11AM ET
    R is increasingly becoming the platform and programming language of choice for many data scientists. learn how you can leverage Open-Source R on your IBM PureData System for Analytics/Netezza appliances! Register here: bit.ly/1ORAkzQ

 

Build skills for 2016 and Beyond: Data Warehousing and Analytics Top 10 Resources

by Cindy Russell, IBM Data Warehouse Marketing

Skills are always an essential consideration in technical careers and it is important for data warehousing professionals to expand their knowledge to handle the proliferation of data types and volumes in 2016 and beyond.

These are my “top 10” resource picks that you may want to explore. I am choosing these because of their popularity and also because they represent new technologies you may face in 2016 as you modernize your data warehouse and extend it beyond its traditional realm to meet new analytics needs.

  1. Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics – I am recommending this report because it provides an overview of the trends, issues and marketplace leaders in data warehousing. It calls out the need for the Logical Data Warehouse, which is a key element of a modernization strategy. I believe the Logical Data Warehouse will be of increasing importance to your operations in the coming months. Read a summary of the report.
  2. Logical Data Warehouse – Due to the massive and rapid growth of data volumes and types, a single centralized data warehouse cannot meet all of the new needs for analytics by itself. The data warehouse now becomes part of a Logical Data Warehouse in which a set of “fit for purpose” stores are used to house a range of data. This blog by Wendy Lucas was published in 2014, but is still a good primer on the concept if you need one.
  3. IBM Fluid Query information and entitlement for PureData clients – In 2015, we released a series of “agile” announcements of IBM Fluid Query. This is a tool that PureData System for Analytics clients can use to query more data sources for deeper insights. This tool is a key element when you have a Logical Data Warehouse where data stores include Hadoop, databases, other data warehouses and more. PureData clients can take advantage of this technology as part of the entitlements. Start learning with our blog series and webcast.
  4. dashDB, data warehousing on the cloud – dashDB was launched in 2014 as the IBM fully managed data warehouse in the cloud. Some initial use cases cloud be: setting up self-service data science sandboxes, establishing test environments or cost-effectively housing data that is already external, such as social media feeds. dashDB is based on the Netezza and BLU Acceleration in-memory computing technologies. If you have workloads you want to place on the cloud, dashDB is a good solution. This webcast and a TDWI Checklist for cloud get you started.
  5. Hadoop and Big SQL – Hadoop is a scalable, cost-effective, open source file system that can store a range of structured or unstructured data as part of a Logical Data Warehouse. It can also be used to help you manage capacity on the data warehouse, for example as a queryable historical archive. Read this blog by our expert to learn the basics. IBM provides a free open source distribution, IBM Open Platform with Apache Hadoop. For those looking to augment the IBM Open Platform, IBM BigInsights adds enterprise-grade features including visualization, exploration and advanced analytics. Within the family is an implementation that includes Big SQL—enabling you to use familiar SQL skills to query data in Hadoop. Explore the above content options, then get started with a no charge trial.
  6. Apache Spark –IBM announced a major commitment to Apache Spark in June, 2015 and has already made available a series of Spark-based products and cloud services. You will be seeing more of Spark across the IBM Analytics portfolio, so it is a good technology to learn. Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. It performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for speed, and supports the Java, Scala, and Python APIs for ease of development. I recommend this no charge Big Data University course on Spark fundamentals.
  7. Update to IBM Netezza Analytics software – For those of you who are PureData System for Analytics clients, there is an update to the Netezza Analytics software. Doug Daily is one of our experts in this area, and he created an announcement blog to help you understand what new capabilities you can leverage.
  8. Virtual Enzee on demand webcasts – IBM offers webcasts on topics related to data warehousing and PureData System for Analytics. Browse the “Virtual Enzee” webcast library to stay up to date on PureData through these on demand webcasts.
  9. Learn Cognos Analytics for user self-service applications – Some of our clients use Cognos BI in conjunction with their data warehouses for super-fast reporting. Cognos Analytics was announced at IBM Insight as a guided, self-service capability that provides a personal approach to analytics. As your users are demanding more insights, self-service may be a sound solution to some of their needs. Browse the blog and web site to learn more.
  10. IBMGo on demand keynotes from IBM Insight – If you were unable to attend IBM Insight 2015, IBMGo brings some of the main sessions to you! It is a great way to learn about the bigger IBM Analytics solutions and points of view. Start here.

Tweet this blog

What’s new: Netezza Platform Software and INZA software for PureData Systems for Analytics

by Doug Dailey

The IBM PureData Systems for Analytics team has just released a new set of enhancements over current software versions of Netezza Platform Software (NPS), INZA and IBM Fluid Query. These include enhanced  integration, security, real-time analytics for z Systems and usability features, all included in our latest software suite that has been posted on Fix Central.

There will be something here for everyone, whether you are looking to increase security, gain more leverage with DB2 Analytics Accelerator for z/OS*, improve your day-to-day experience or integrate PureData System (Netezza technology) into a Logical Data Warehouse. This post covers the new capabilities and enhancements in NPS 7.2.1 and INZA 3.2.1 software.  Refer to my IBM Fluid Query 1.6 post  for more information.

Strengthening end-to-end security for PureData and DB2 Analytics Accelerator for z/OS

With the advent of self-encrypted disk drives in our N3001 model, we laid the groundwork for securing data at rest. Not only do you have state of the art disk encryption keys by Seagate and Hitachi at work from a hardware standpoint, but you also have added peace of mind through a second tier of security that protects host drives and those drives associated with the Snippet Processing Unit. A local keystore with flexible CLI on the N3001 system enabled you to protect your most valuable assets. This release adds support for KMIP, which now allows 3rd party and IBM targeted key management software to backup, store and manage host and SPU keys on your system. Additional attention was paid to hardening the host systems for the DB2 Analytics Accelerator powered by PureData.
security

Speaking of DB2 Analytics Accelerator, this release of NPS provides key functionality recently added to DB2 Analytics Accelerator in version 5.1 which incorporates Netezza Analytics as a core component to help accelerate the use of predictive analytics applications (e.g., SPSS) such as data mining and in-database modeling. By extending support for the mainframe EBCDIC code to INZA software with support for new sets of procedures, you can run real-time analytics on DB2 Analytics Accelerator and establish work areas for data scientists. In-database transformation supports IBM DataStage balanced optimization and ETL/ELT consolidation processing.

This optimized, integrated appliance has been hardened to not only support self-encrypting drives available through PureData Systems for Analytics N3001 systems, but it now accounts for encryption of data-in-motion by encrypting network with the mainframe, FIPS-enabled RHEL, LFTP and secure VPN. Updated performance around continuous load operations better supports enterprise clients running highly concurrent trickle-feed loads under heavy processing of simultaneous mixed workloads to ensure faster data synchronization and TTV for insights. EBCDIC support for Netezza Analytics provides the ability to execute sophisticated in-database algorithms on DB2 Analytics Accelerator that allow micro-analytics across transactional, historical and real-time data.  NPS software now supports the following algorithms: Decision Tree, Regression Tree, Naïve Bayes, K-means Clustering and Two-Step Clustering.

PureData IDAA images

Making life easier through an improved User Experience

If these aren’t enough, we also targeted some areas to improve overall user experience by providing tooling and support that will make life easier for DBAs, system administrators and application developers:

  • Improved throughput and consistency for trickle-feed and highly concurrent smaller load operations.
  • nzload enhancements reduce TTV and shorten ETL activities; recordDelimiter, newline, timestamp, merge, datedelim, timedelim, and monitor.
  • New merge capability improves RI and positions Oracle migrations to PureData System.
  • nzSQL for Windows greatly improves usability for managing PureData System from the Windows desktop environment.
  • nzSQL support for external remote tables allows users to run load/unload operations from Linux clients to/from a remote file rather than host-only loads.
  • PureData will natively support Microsoft .NET and open a new range of possibilities for partner solutions.
  • JDBC support for JDK 1.7. in both NPS and INZA software ensures support for latest Hadoop distributions and also for Fluid Query.
  • New 64-bit BNR connectors are now certified for the latest versions of Tivoli, Netbackup and EMC.
  • PureData improves uptime by reducing requirements to stop and start NPS when user connections are exceeded.
  • ODBC support is now available for comments through DSN, odbc.ini and connection string (single, multi, inline, nested comments), as well as support for the LIMIT clause.

SQL enhancements

We’ve incorporated support for newer Client Kit OS versions and platforms with this release. Support for Windows 8, Windows 2012 R2, Ubuntu, and a completely new Power PC RHEL client for Little Endian. Support for Power on Little Endian positions PureData Systems for IBM BigInsights and the IBM Open Platform. We have also included additional SQL support for:

  • Support for DROP TABLE IF EXISTS
  • CREATE TABLE IF NOT EXISTS
  • Single slice support for JOINS with multi-column distribution keys
  • SQL push-down of NULL aware
  • New table-based Zone Maps

Client download of these new releases

NPS 7.2.1 and INXA 3.2.1 software is available at no charge to existing PureData clients. It can be easily downloaded from IBM Support Fix Central. Note that business partners and prospective clients can download and explore these new releases on Netezza Developer Network (additional information below).

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to business partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For additional information on Fluid Query 1.6, refer to my what’s new post.

* DB2 Analytics Accelerator for z/OS is a high-performance appliance that integrates the IBM z Systems infrastructure with IBM PureData™ for Analytics, powered IBM Netezza technology. The solution transforms your mainframe into a highly-efficient transactional and analytics processing environment. This enables clients to exploit z Systems data where it originates.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

What’s new: IBM Fluid Query 1.6

by Doug Dailey

Editorial Note: IBM Fluid Query 1.7 became available in May, 2016. You can read about features in release 1.6 here, but we also recommend reading the release 1.7 blog here.

The IBM PureData Systems for Analytics team has assembled a value-add set of enhancements over current software versions of Netezza Platform Software (NPS), INZA software and Fluid Query. We have enhanced  integration, security, real-time analytics for System z and usability features with our latest software suite arriving on Fix Central today.

There will be something here for everyone, whether you are looking to integrate your PureData System (Netezza) into a Logical Data Warehouse, improve security, gain more leverage with DB2 Analytics Accelerator for z/OS, or simply improve your day-to-day experience. This post covers the IBM Fluid Query 1.6 technology.  Refer to my NPS and INZA post (link) for more information on the enhancements that are now available in these other areas.

Integrating with the Logical Data Warehouse: Fluid Query overview

Are you struggling with building out your data reservoir, lake or lagoon? Feeling stuck in a swamp? Or, are you surfing effortlessly through an organized Logical Data Warehouse (LDW)?

Fluid Query offers a nice baseline of capability to get your PureData footprint plugged into your broader data environment or tethered directly to your IBM BigInsights Apache Hadoop distribution. Opening access across your broader ecosystem of on-premise, cloud, commodity hardware and Hadoop platforms gets you ever closer to capturing value throughout “systems of engagement” and “systems of record” so you can reveal new insights across the enterprise.

Now is the time to be fluid in your business, whether it is ease of data integration, access to key data for discovery/exploration, monetizing data, or sizing fit-for-purpose stores for different data types.  IBM Fluid Query opens these conversations and offers some valuable flexibility to connect the PureData System with other PureData Systems, Hadoop, DB2, Oracle and virtually any structured data source that supports JDBC drivers.

The value of content and the ability to tap into new insights is a must have to compete in any market. Fluid Query allows you to provision data for better use by application developers, data scientists and business users. We provide the tools to build the capability to enable any user group.

fluid query connectors

What’s new in Fluid Query 1.6?

Fluid Query was released this year and is in its third “agile” release of the year. As part of NPS software, it is available at no charge to existing PureData clients, and you will find information on how to access Fluid Query 1.6 below.

This capability enables you to query more data for deeper analytics from PureData. For example, you can query data in the PureData System together with:

  • Data in IBM BigInsights or other Hadoop implementations
  • Relational data stores (DB2, 3rd party and open source databases like Postgres, MySQL, etc.)
  • Multi-generational PureData Systems for Analytics systems (“Twin Fin”, “Striper”, “Mako”)

The following is a summary of some new features in the release that all help to support your needs for insights across a range of data types and stores:

  • Generic connector for access to structured data stores that support JDBC
    This generic connector enables you to select the database of choice. Database servers and engines like Teradata, SQL Server, Informix, MemSQL and MAPR can now be tapped for insight. We’ve also provided a capability to handle any data type mismatches between differing source/target systems.
  • Support for compressed read from Big SQL on IBM BigInsights
    Now using the Big SQL capability in IBM BigInsights, you are able to read compressed data in Hadoop file systems such as Big Insights, Cloudera and Hortonworks. This adds increased flexibility and efficiency in storage, data protection and access.
  • Ability to import databases to Hadoop and append to tables in Hadoop
    New capabilities now enable you to import databases to Hadoop, as well as append data in existing tables in Hadoop. One use case for this is backing up historical data to a queryable archive to help manage capacity on the data warehouse. This may include incremental backups, for example from a specific date for speed and efficiency.
  • Support for the lastest Hadoop distributions
    Fluid Query v. 1.6 now supports the latest Hadoop distributions, including BigInsights 4.1, Hortonworks 2.5 and Cloudera 5.4.5. For Netezza software, support is now available for NPS 7.2.1 and INZA 3.2.1.

Fluid Query 1.6 can be easily downloaded from IBM Support Fix Central. I encourage you to refer to my “Getting Started” post that was written for Fluid Query 1.5 for additional tips and instructions. Note that this link is for existing PureData clients. Refer to the section below if you are not a current client.

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to helping our partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For more information,  NPS 7.2.1 and INZA 3.2.1 software, refer to my post.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

Get smart on IBM Data Warehousing at IBM Insight 2015

A quick reference guide to IBM Data Warehousing sessions for BLU Acceleration in-memory database, PureData System for Analytics and new IBM Fluid Query

by Cindy Russell, IBM Data Warehouse Marketing

IBM Insight is always educational and fun, and this year is no exception. Many IBM technical experts and IBM clients will be presenting on a range of topics. This is an excellent opportunity to learn more about IBM products you already use, as well as products and technologies that you don’t. Here is a summary view of some keynotes, breakout sessions and events to consider as you plan your schedule.

I have included my “editor’s pick” sessions in boldface type. You can use the Insight session tool to find more detail on the sessions that interest you. And for those of you who have already registered for Insight, build your agenda now by logging into ibmeventconnect.com/insight.  Please note that session schedules are subject to change.

General sessions and Data Warehousing overview sessions

  • Data Management Keynote is Monday October 26, from 1 – 2 PM in the Mandalay Ballroom

Monday breakout sessions

  • DDW-3353: The Evolution of Data Warehousing is “Logical”
  • DDW-3361: Shifts in Data Warehousing and Enabling Self Service to Drive More Agile Analytics
  • DDW-2267: Gartner Perspective on Big Data and Enterprise Analytics

Tuesday breakout sessions

  • DDW-3983: Ford is Changing the Way the World Moves With IBM Big Data and Analytics

Thursday breakout sessions

  • DDW-2675: Which Analytic Model is Right for My Data? A Comparison of Modern Warehouse Architectures
  • DDW-2659: The Data Reservoir: More Than Storage, Optimizing Your Data for Insight
  • DDW-2739: Operational Analytics at the Speed of Thought: The Modern Enterprise Data Warehouse
  • DDW-1951: Model Driven Approaches to Consistently Managing and Governing the Logical Data Warehouse

Expo Hall events and demos

In addition to breakout sessions, there will be some informal talks and opportunities to connect with our experts in the Expo Hall. Here are the sessions that apply to IBM Data Warehousing products. For Expo Hall hours click here.

Monday events and information

  • Expo Hall booth number: 860
  • VAL-4125: AMA: How IBM Fluid Query Solves Your Complex Big Data and Analytics Problems
  • VAL-4126: 20m Talk How IBM Fluid Query Solves Your Complex Big Data and Analytics Problems
  • Demo room: FE-06 Fluid Query, DCM-15 IBM PureData for Analytics (INZA), DCM-17 IBM Industry Data Model, DCM-19 IBM DB2 with BLU Acceleration

Tuesday events and information

  • Expo Hall booth number: 860
  • DDW-4031: Meet the Experts IBM DB2 BLU and dashDB
  • Demo room: FE-06 Fluid Query, DCM-15 IBM PureData for Analytics (INZA), DCM-17 IBM Industry Data Model, CM-19 IBM DB2 with BLU Acceleration

Wednesday events events and information

  • Expo Hall booth number: 860
  • DDW-4013: IBM Fluid Query – Unifying Data Access Across the Logical Data Warehouse
  • DDW-4079: Meet the Experts: IBM PureData System for Analytics (Netezza)
  • Demo room: FE-06 Fluid Query, DCM-15 IBM PureData for Analytics (INZA), DCM-17 IBM Industry Data Model, CM-19 IBM DB2 with BLU Acceleration

DB2 with BLU Acceleration in-memory database

BLU Acceleration is in-memory computing technology in DB2 for Linux, UNIX and Windows. If you are experiencing slow reporting on data in structured databases, then BLU Acceleration can help you deliver results much more quickly. Clients report that queries that used to take hours now process in seconds using BLU Acceleration technology!

Here are some sessions to consider:

Monday breakout sessions

  • DDW-2619: What’s New in BLU Acceleration Tips and Insights on the Latest In Memory Columnar Technologies

Tuesday breakout sessions

  • DDW-1202: Implementing a Data Warehouse and BI Solution with DB2 BLU Acceleration, InfoSphere and Cognos
  • DDW-3916: Revitalize your Data Warehouse: Taking Advantage of the Latest Technologies (client presentation from Blue Cross and Blue Shield of Tennessee)
  •  DDB-3593: Scaling Up BLU Acceleration with Consistent Performance in a High
  • DDB-2815: Advances in Analytics Using DB2 with BLU Acceleration on Intel Architecture

Wednesday breakout sessions

  • DDW-1647: A Comparison Between DB2 with BLU Acceleration and Other In Memory Databases
  • DDW-2436: POWER Systems Running DB2 with BLU Acceleration: Delivering Top Performance

Thursday breakout sessions

  • DDW-3665: Wall Street Success Stories of DB2 with BLU Acceleration
  • DDW-2469: How DB2 with BLU Acceleration Helps a Bank Make Money: A Real World Data Analytics Case Study
  • DDW-2972: Apache Spark and DB2 with BLU Acceleration: Making ‘People Flow’ in Cities Measurable and Analyzable

PureData System for Analytics and IBM Fluid Query

PureData System for Analytics is a data warehousing appliance that delivers data service to today’s demanding analytic applications. It is offers built-in expertise, as well as integrated hardware, software and storage capabilities specifically for high performance data workloads. It simplifies procurement, installation and management so you can focus on other high-value projects. IBM Fluid Query is a new addition to PureData that lets you analyze more data sources such as Hadoop and many others for deeper insights. You can also download the Enzee Conference Guide for a list of ALL sessions with PureData/Netezza content here: ibm.biz/enzeeguideinsight

Monday breakout sessions

  • DDW-3366: PureData for Analytics/Netezza Data Warehouse Appliance – Overview and Update
  • DDW-2663: IBM Fluid Query The “Power” Behind the Data Reservoir/Logical Data Warehouse
  • DDW-3500: How a Digital Media Firm Uses PureData System for Analytics, Cognos, SPSS to Hone Creative Marketing

Tuesday breakout sessions

  • DDW-1909: One Query Drives It All Fluid Hadoop in the Unified Data Warehouse
  • DDW-1150: Performance Optimization With IBM PureData System for Analytics, powered by Netezza
  • DDW-1216: Mattel’s Big Data Ecosystem Journey: Beginning The Integration of Unstructured Data
  • DDW-3588: BB&T and Netezza: Practical and Best Practices for Building an Analytics Platform
  • DDW-3094: N3001 001 Mini Appliance The Most Affordable PureData System for Analytics

Wednesday breakout sessions

  • DDW-1164: Business Outcomes and Implementation Strategy for Enterprise Data Warehouse in Healthcare
  • DDW-2150: Integrating BigInsights and PureData System for Analytics With Query Federation and Data Movement
  • DDW-1213: IBM PureData System for Analytics Successfully Changed How the Blackhawk Network Leverages Data
  • DDW-3073: Werner Implements Netezza and Information Server to Enable Smarter Decision Making
  • DDW-1723: Improving PureData System for Analytics Performance at Kimberly Clark
  • DDW-2145: Experian Case Study Conversion From SQL Server to Netezza
  • DDW-2109: How IBM Fluid Query Solves Your Complex Big Data and Analytics Problems

Thursday breakout sessions

  • DDW-3094: N3001 001 Mini Appliance The Most Affordable PureData System for Analytics
  • DDW-3369: Insight Into Your PureData System for Analytics Appliance Using IBM Netezza Performance Portal Tool
  • DDW-2515: Realizing Solutions with IBM PureData System for Analytics
  • DDW-1708: PureData System for Analytics for regulatory reports, what you need on the top of the top technology

Enzee Universe

Don’t miss Enzee Universe on Sunday, October 25th.  Enzee Universe is a conference within the Insight conference dedicated to a full day of PureData – Netezza technology.  This event is free for all registered Insight attendees. Just add sessions 3967 and 3968 to your agendas!

  • DDW-3967: Enzee Universe Part 1 Technical Sessions and Best Practices
  • DDW-3968: Enzee Universe Part 2 Business Update and Product Strategy

dashDB

IBM dashDB is a fully managed cloud data warehouse service. It offers massive scalability and performance through its MPP architecture, and is compatible with a wide range of business intelligence toolsets and analytics. dashDB’s integrated, in-database analytics let you quickly realize more value from your data. dashDB includes aspects of the Netezza and BLU Acceleration technologies.

Use the session tool to search on the dashDB keyword.

IBM DB2 Analytics Accelerator for z/OS

IBM DB2 Analytics Accelerator for z/OS is a high-performance appliance that integrates the IBM z Systems infrastructure with IBM PureData System for Analytics, powered by IBM Netezza technology. The solution transforms your mainframe into a highly-efficient transactional and analytics processing environment.

Learn more about the sessions for this product here.

 

Performance – Getting There and Staying There with PureData System for Analytics

by David Birmingham, Brightlight Business Analytics, A division of Sirius Computer Solutions and IBM Champion

Many years ago in a cartoon dialogue, Dilbert’s boss expressed concern for the theft of their desktop computers, but Dilbert assured him, to his boss’ satisfaction, that if he loaded them with data they would be too heavy to move. Hold that thought.

Co location: Getting durable performance from queries

Many shops will migrate to a new PureData System for Analytics appliance, Powered by Netezza Technology, simply by copying old data structures into the new data warehouse appliance. They then point their BI tools at it and voila, a 10x performance boost just for moving the data. Life is good.

The shop moves on by hooking up the ETL tools, backups and other infrastructure, not noticing that queries that ran in 5 seconds the week before, now run in 5.1 seconds. As the weeks wear on, 5.1 seconds become 6, then 7, then 10 seconds. Nobody is really watching, because 10 seconds is a phenomenal turnaround compared to their prior system’s 10-minute turnaround.

But six months to a year down the line, when the query takes 30 seconds or longer to run, someone may raise a flag of concern. By this time, we’ve built many new applications on these data structures. Far-and-away more data has been added to its storage. In true Dilbert-esque terms, loading more data makes the system go slower.

PureData has many layers of high-performance hardware, each one more powerful than the one above it. Adhering to this leverage over time helps maintain durable performance.

The better part about a PureData machine is that it has the power to address this by adhering to a few simple rules. When simply migrating point-to-point onto a PureData appliance, we’re likely not taking advantage of the core power-centers in Netezza technology. The point-to-point migration starts out in first-gear and never shifts up to access more power. That is, PureData has many layers of high-performance hardware, each one more powerful than the one above it. Adhering to this leverage over time helps maintain durable performance. The system may eventually need an upgrade for storage reasons, but not for performance reasons.

PureData is a physical machine with data stored on its physical “real estate”, but unlike buying a house with “location-location-location!” we want “co-location-co-location-co-location!” Two flavors of data co-location exist: zone maps and data distribution. The use of these (or lack thereof) either enable or constrain performance. These factors are physical, because performance is in the physics. It’s not enough to migrate or maintain a logical representation of the data. Physical trumps logical.

Zone maps, a powerful form of co-location in PureData

The most powerful form of co-location is zone maps, optimized through the Organize-On and Groom functions. Think of transaction_date as an Organize-On optimization key. The objective is to regroup the physical records so that those with like-valued keys are co-located on as few disk pages as possible. Groom will do this for us. Now when a query is issued against the table, filtering the transaction_date on a date value or date range filter, this query will be applied to the zone maps to derive the known physical disk locations and exclude all others. This is Netezza’s principle of using the query to tell it “where-not-to-look”.

The additional caveat is that the physical co-location of records by Organize-On keys is only valuable if they are actually used in the query. They radically reduce data reads, for example from 5 thousand pages down to 5 pages to get the same information. That’s a 1000x boost! The zone maps, enabled by Organize-On and Groom, are what achieve these dramatic performance boosts. If we do not use them, then queries will initiate a full table-scan which naturally takes more time.

The reason why this is so important is that disk-read is the number one penalty of the query, with no close second. A PureData System N200x or N3001 can read over 1100 pages per second on a given data slice. So if the query scans 5000 pages for each, it’s easily a 4-second query. But it won’t stay a 4-second query. As the data grows from 5000 pages to 10,000 pages, it will become a 10-second query. If the query leverages the zone maps and reduces it consistently to say, 100 pages per query, the query will achieve a sub-second duration and remain there for the life of the solution.

Does this sound like too much physical detail to know for certain what to do? That’s why the Organize-On and Groom functions make it easy. Just use the Query History’s column access statistics, locate the largest tables and find the most-often-accessed columns in where-clause filters (just don’t Organize-On join-only columns or distribution keys!). Add them to the Organize-On, Groom the table and watch this single action boost the most common queries into the stratosphere.

Data Distribution, co-location through “data slices”

Data distribution is another form of co-location. On a PureData system, every table is automatically divided across disks, each representing a “data slice”. Basically when a distribution key (e.g. Customer_ID) is used, the machine will hash the key values to guarantee that records with the same key value will always arrive on the same data slice. If several tables are distributed on the same key value, their like-keyed records will also be co-located on the same data slice. This means joining on those keys will initiate a parallel join, or what is called a co-located read.

Another of the most powerful aspects of Netezza technology is the ability to process data in parallel. Using the same distribution key to make an intermediate table, an insert-select styled query will perform a co-located read and a co-located write, effectively performing the operation in massively parallel form and at very fast speeds. Netezza technology can eclipse a mainframe in both its processing speed and ability to move and position large quantities of data for immediate consumption.

A few tweaks to tables and queries however, can yield a 100x or 1000x boost…

The caveat of data distribution is that a good distribution model can preserve capacity for the long-term. A distribution model that does not leverage co-located joining will chew-up the machine’s more limited resources such as memory and the inter-process network fabric. If we have enough of these queries running simultaneously, the degradation becomes extremely pronounced. A few tweaks to tables and queries however, can yield a 100x or 1000x boost; and without them the solution is using 10x or 100x more machine capacity than necessary. This is why some machines appear very stressed even though they are doing and storing so little.

Accessing the machine’s “deep metal”

Back to the notion of a “simple migration”. Does it sound like a simple point-to-point migration will leverage the power of the machine? Do the legacy queries use where-clause filters that can consistently invoke the zone maps? Are the tables configured to be heavily dependent upon indexes to support performance? If so, then the initial solution will be in first-gear.

But wait, maybe the migration happened a year or so ago and now the machine is “under stress” for no apparent reason. Where did all the capacity go? It’s still waiting to be used, in the deep-metal of the machine, the metal that the migrated solution doesn’t regard. It’s easy to fix that and voila, all this “extra” capacity seemingly appears from nowhere, like magic! It was always there. The solution was ignoring it and grinding the engines in first gear.

Enable business users to explore deep data detail

When Stephen Spielberg made Jurassic Park, he mentioned that the first dinosaur scene with the giant Brachiosaurus required over a hundred hours of film and CGI crunched into fifteen seconds of movie magic.

This represents a typical analytic flow model, where tons of data points are summarized into smaller form for fast consumption by business analysts. PureData System changes this because it is fast and easy to expose deep detail to users. Business analysts like to have access to the detail of deep data because summary structures will throw away useful details in an effort to boost performance on other systems.

The performance is built-in to the machine. It’s an appliance after all.

Architects and developers alike can see how the “Co-location, co-location, co-location!” is easy to configure and maintain, offering a durable performance experience that is also adaptable as business needs change over time. Getting there and staying there doesn’t require a high-wall of engineering activities or a gang of administrators  on roller-skates to keep it running smoothly. The performance is built-in to the machine. It’s an appliance after all.

About David,

David Birmingham, Brighlight, Sirius Computing Solutions David is a Senior Solutions Architect with Brightlight Consulting, a division of Sirius Computer Solutions, and an IBM Champion since 2011. He has over 30 years of extensive experience across the entire BI/DW lifecycle. David is one of the world’s top experts in PureData for Analytics (Netezza) – is the author of Netezza Underground and Netezza Transformation (both on Amazon.com) and various essays on IBM Developerworks’ Netezza Underground Blog. He is also a five-year IBM Champion, a designation that recognizes the contributions of IBM customers and partners.  Catch David each year at the Sunday IBM Insight Enzee Universe for new insights on best practices and solutions with the machine.