Three session guides get you started with data warehousing at IBM Insight at World of Watson

Join us October 24 to 27, 2016 in Las Vegas!

by Cindy Russell, IBM Data Warehouse marketing

IBM Insight has been the premiere data management and analytics event for IBM analytics technologies, and 2016 is no exception.  This year, IBM Insight is being hosted along with World of Watson and runs from October 24 to 27, 2016 at the Mandalay Bay in Las Vegas, Nevada.  It includes 1,500 sessions across a range of technologies and features keynotes by IBM President and CEO, Ginni Rometty; Senior Vice President of IBM Analytics, Bob Picciano; and other IBM Analytics and industry leaders.  Every year, we include a little fun as well, and this year the band is Imagine Dragons.

IBM data warehousing sessions will be available across the event as well as in the PureData System for Analytics Enzee Universe (Sunday, October 23).  Below are product-specific quick reference guides that enable you to see at a glance key sessions and activities, then plan your schedule.  Print these guides and take them with you or put the links to them on your phone for reference during the conference.

This year, the Expo floor is called the Cognitive Concourse, and we are located in the Monetizing Data section, Cognitive Cuisine experience area.  We’ll take you on a tour across our data warehousing products and will have some fun as we do it, so please stop by.  There is also a demo room where you can see live demos and engage with our technical experts, as well as a series of hands-on labs that let you experience our products directly.

The IBM Insight at World of Watson main web page is located here.  You can register and then use the agenda builder to create your personalized schedule.

IBM PureData System for Analytics session reference guide

Please find the session quick reference guide for PureData System for Analytics here: ibm.biz/wow_enzee

Enzee Universe is a full day of dedicated PureData System for Analytics / Netezza sessions that is held on Sunday, October 23, 2016.  To register for Enzee Universe, select sessions 3459 and 3461 in the agenda builder tool.  This event is open to any full conference pass holder.

During the regular conference, there are also more than 35 PureData, Netezza, IBM DB2 Analytics Accelerator for z/OS (IDAA) technical sessions across all the conference tracks, as well as hands on labs.  There are several session being presented by IBM clients so you can see how they put PureData System for Analytics to use.  Click the link above to see the details.

IBM dashDB Family session reference guide

Please find the session quick reference guide for the dashDB family here: ibm.biz/wow_dashDB

There are a more than 40 sessions for dashDB, including a “Meet the Family” session that will help you become familiar with new products in this family of modern data management and data warehousing tools.  There is also a “Birds of a Feather” panel discussion on Hybrid Data Warehousing, and one that describes some key use cases for dashDB.  And, you can also see a demo, take in a short theatre session or try out a hands-on lab.

IBM BigInsights, Hadoop and Spark session reference guide

Please find the session quick reference guide for BigInsights, Hadoop and Spark topics here: ibm.biz/wow_biginsights

There are more than 65 sessions related to IBM BigInsights, Hadoop and Spark, with several hands on labs and theatre sessions. There is everything from an Introduction to Data Science to Using Spark for Customer Intelligence Analytics to hybrid cloud data lakes to client stories of how they use these technologies.

Overall, it is an exciting time to be in the data warehousing and analytics space.  This conference represents a great opportunity to build depth on IBM products you already use, learn new data warehousing products, and look across IBM to learn completely new ways to employ analytics—from Watson to Internet of Things and much more.  I hope to see you there.

IBM Fluid Query 1.7 is Here!

by Doug Dailey

IBM Fluid Query offers a wide range of capabilities to help your business adapt to a hybrid data architecture and more importantly it helps you bridge across “data silos” for deeper insights that leverage more data.   Fluid Query is a standard entitlement included with the Netezza Platform Software suite for PureData for Analytics (formerly Netezza). Fluid Query release 1.7 is now available, and you can learn more about its features below.

Why should you consider Fluid Query?

It offers many possible uses for solving business problems in your business. Here are a few ideas:
• Discover and explore “Day Zero” data landing in your Hadoop environment
• Query data from multiple cross-enterprise repositories to understand relationships
• Access structured data from common sources like Oracle, SQL Server, MySQL, and PostgreSQL
• Query historical data on Hadoop via Hive, BigInsights Big SQL or Impala
• Derive relationships between data residing on Hadoop, the cloud and on-premises
• Offload colder data from PureData System for Analytics to Hadoop to free capacity
• Drive business continuity through low fidelity disaster recovery solution on Hadoop
• Backup your database or a subset of data to Hadoop in an immutable format
• Incrementally feed analytics side-cars residing on Hadoop with dimensional data

By far, the most prominent use for Fluid Query for a data warehouse administrator is that of warehouse augmentation, capacity relief and replicating analytics side-cars for analysts and scientists.

New: Hadoop connector support for Hadoop file formats to increase flexibility

IBM Fluid Query 1.7 ushers in greater flexibility for Hadoop users with support for popular file formats typically used with HDFS.Fluid query 1.7 connector picture These include popular data storage formats like AVRO, Parquet, ORC and RC that are often used to manage bigdata in a Hadoop environment.

Choosing the best format and compression mode can result in drastic differences in performance and storage on disk. A file format that doesn’t support flexible schema evolution can result in a processing penalty when making simple changes to a table. Let’s just  say that if you live in the Hadoop domain, you know exactly what I am speaking of. For instance, if you want to use AVRO, do your tools have readers and writers that are compatible? If you are using IMPALA, do you know that it doesn’t support ORC, or that Hortonworks and Hive-Stinger don’t play well with Parquet? Double check your needs and tool sets before diving into these popular format types.

By providing support for these popular formats,  Fluid Query allows you to import, store, and access this data through local tools and utilities on HDFS. But here is where it gets interesting in Fluid Query 1.7: you can also query data in these formats through the Hadoop connector provided with IBM Fluid Query, without any change to your SQL!

New: Robust connector templates

In addition, Fluid Query 1.7 now makes available a more robust set of connector templates that are designed to help you jump start use of Fluid Query. You may recall we provided support for a generic connector in our prior release that allows you to configure and connect to any structured data store via JDBC. We are offering pre-defined templates with the 1.7 release so you can get up and running more quickly. In cases where there are differences in user data type mapping, we also provide mapping files to simplify access.  If you have your own favorite database, you can use our generic connector, along with any of the provided templates as a basis for building a new connector for your specific needs. There are templates for Oracle, Teradata, SQL Server, MySQL, PostgreSQL, Informix, and MapR for Hive.

Again, the primary focus for Fluid Query is to deliver open data access across your ecosystem. Whether the data resides on disk, in-memory, in the Cloud or on Hadoop, we strive to enable your business to be open for data. We recognize that you are up against significant challenges in meeting demands of the business and marketplace, with one of the top priorities around access and federation.

New: Data movement advances

Moving data is not the best choice. Businesses spend quite a bit of effort ingesting data, staging the data, scrubbing, prepping and scoring the data for consumption for business users. This is costly process. As we move closer and closer to virtualization, the goal is to move the smallest amount of data possible, while you access and query only the data you need. So not only is access paramount, but your knowledge of the data in your environment is crucial to efficiently using it.

Fluid Query does offer data movement capability through what we call Fast Data Movement. Focusing on the pipe between PDA and Hadoop, we offer a high speed transfer tool that allows you to transfer data between these two environments very efficiently and securely. You have control over the security, compression, format and where clause (DB, table, filtered data). A key benefit is our ability to transfer data in our proprietary binary format. This enables orders of magnitude performance over Sqoop, when you do have to move data.

Fluid Query 1.7 also offers some additional benefits:
• Kerberos support for our generic database connector
• Support for BigInsights Big SQL during import (automatically synchronizes Hive and Big SQL on import)
• Varchar and String mapping improvements
• Import of nz.fq.table parameter now supports a combination of multiple schemas and tables
• Improved date handling
• Improved validation for NPS and Hadoop environment (connectors and import/export)
• Support for BigInsights 4.1 and Cloudera 5.5.1
• A new Best Practices User Guide, plus two new Tutorials

You can download this from IBM’s Fix Central or the Netezza Developer’s Network for use with the Netezza Emulator through our non-warranted software.

Picture1

Take a test drive today!

About Doug,
Doug Daily
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

Virtual Enzee webcast roundup for 2016

By Cindy Russell

The first Virtual Enzee webcast of 2016 is scheduled for January 29th!  I will be updating this blog during 2016 so you have a handy resource to find out what sessions are upcoming and also listen to the replays on demand.

  1. Unifying Data Access across the Logical Data Warehouse with IBM Fluid Query
    IBM Fluid Query helps bring your enterprise into focus and eliminate some of the traditional barriers that exist between Fluid query enzee for wordpressdisparate data in your enterprise. In this session, we’ll review some common user stories for using Fluid Query through the lens of PureData System for Analytics/Netezza and BigInsights. Register here: http://bit.ly/231tICu|
  2.  

  3. Tame Spatial Queries with Netezza In-Database Analytics
    Attend this Virtual Enzee to learn how Netezza supports spatial data types and queries, how it can shorten complex spatial analytic projects and how it integrates with and complements existing geospatial platforms and solutions.  Register: bit.ly/FebEnz
  4.  

  5. Accelerating Open-Source R with IBM PureData System for Analytics (Netezza), January 29, 2016 at 11AM ET
    R is increasingly becoming the platform and programming language of choice for many data scientists. learn how you can leverage Open-Source R on your IBM PureData System for Analytics/Netezza appliances! Register here: bit.ly/1ORAkzQ

 

What’s new: Netezza Platform Software and INZA software for PureData Systems for Analytics

by Doug Dailey

The IBM PureData Systems for Analytics team has just released a new set of enhancements over current software versions of Netezza Platform Software (NPS), INZA and IBM Fluid Query. These include enhanced  integration, security, real-time analytics for z Systems and usability features, all included in our latest software suite that has been posted on Fix Central.

There will be something here for everyone, whether you are looking to increase security, gain more leverage with DB2 Analytics Accelerator for z/OS*, improve your day-to-day experience or integrate PureData System (Netezza technology) into a Logical Data Warehouse. This post covers the new capabilities and enhancements in NPS 7.2.1 and INZA 3.2.1 software.  Refer to my IBM Fluid Query 1.6 post  for more information.

Strengthening end-to-end security for PureData and DB2 Analytics Accelerator for z/OS

With the advent of self-encrypted disk drives in our N3001 model, we laid the groundwork for securing data at rest. Not only do you have state of the art disk encryption keys by Seagate and Hitachi at work from a hardware standpoint, but you also have added peace of mind through a second tier of security that protects host drives and those drives associated with the Snippet Processing Unit. A local keystore with flexible CLI on the N3001 system enabled you to protect your most valuable assets. This release adds support for KMIP, which now allows 3rd party and IBM targeted key management software to backup, store and manage host and SPU keys on your system. Additional attention was paid to hardening the host systems for the DB2 Analytics Accelerator powered by PureData.
security

Speaking of DB2 Analytics Accelerator, this release of NPS provides key functionality recently added to DB2 Analytics Accelerator in version 5.1 which incorporates Netezza Analytics as a core component to help accelerate the use of predictive analytics applications (e.g., SPSS) such as data mining and in-database modeling. By extending support for the mainframe EBCDIC code to INZA software with support for new sets of procedures, you can run real-time analytics on DB2 Analytics Accelerator and establish work areas for data scientists. In-database transformation supports IBM DataStage balanced optimization and ETL/ELT consolidation processing.

This optimized, integrated appliance has been hardened to not only support self-encrypting drives available through PureData Systems for Analytics N3001 systems, but it now accounts for encryption of data-in-motion by encrypting network with the mainframe, FIPS-enabled RHEL, LFTP and secure VPN. Updated performance around continuous load operations better supports enterprise clients running highly concurrent trickle-feed loads under heavy processing of simultaneous mixed workloads to ensure faster data synchronization and TTV for insights. EBCDIC support for Netezza Analytics provides the ability to execute sophisticated in-database algorithms on DB2 Analytics Accelerator that allow micro-analytics across transactional, historical and real-time data.  NPS software now supports the following algorithms: Decision Tree, Regression Tree, Naïve Bayes, K-means Clustering and Two-Step Clustering.

PureData IDAA images

Making life easier through an improved User Experience

If these aren’t enough, we also targeted some areas to improve overall user experience by providing tooling and support that will make life easier for DBAs, system administrators and application developers:

  • Improved throughput and consistency for trickle-feed and highly concurrent smaller load operations.
  • nzload enhancements reduce TTV and shorten ETL activities; recordDelimiter, newline, timestamp, merge, datedelim, timedelim, and monitor.
  • New merge capability improves RI and positions Oracle migrations to PureData System.
  • nzSQL for Windows greatly improves usability for managing PureData System from the Windows desktop environment.
  • nzSQL support for external remote tables allows users to run load/unload operations from Linux clients to/from a remote file rather than host-only loads.
  • PureData will natively support Microsoft .NET and open a new range of possibilities for partner solutions.
  • JDBC support for JDK 1.7. in both NPS and INZA software ensures support for latest Hadoop distributions and also for Fluid Query.
  • New 64-bit BNR connectors are now certified for the latest versions of Tivoli, Netbackup and EMC.
  • PureData improves uptime by reducing requirements to stop and start NPS when user connections are exceeded.
  • ODBC support is now available for comments through DSN, odbc.ini and connection string (single, multi, inline, nested comments), as well as support for the LIMIT clause.

SQL enhancements

We’ve incorporated support for newer Client Kit OS versions and platforms with this release. Support for Windows 8, Windows 2012 R2, Ubuntu, and a completely new Power PC RHEL client for Little Endian. Support for Power on Little Endian positions PureData Systems for IBM BigInsights and the IBM Open Platform. We have also included additional SQL support for:

  • Support for DROP TABLE IF EXISTS
  • CREATE TABLE IF NOT EXISTS
  • Single slice support for JOINS with multi-column distribution keys
  • SQL push-down of NULL aware
  • New table-based Zone Maps

Client download of these new releases

NPS 7.2.1 and INXA 3.2.1 software is available at no charge to existing PureData clients. It can be easily downloaded from IBM Support Fix Central. Note that business partners and prospective clients can download and explore these new releases on Netezza Developer Network (additional information below).

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to business partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For additional information on Fluid Query 1.6, refer to my what’s new post.

* DB2 Analytics Accelerator for z/OS is a high-performance appliance that integrates the IBM z Systems infrastructure with IBM PureData™ for Analytics, powered IBM Netezza technology. The solution transforms your mainframe into a highly-efficient transactional and analytics processing environment. This enables clients to exploit z Systems data where it originates.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

What’s new: IBM Fluid Query 1.6

by Doug Dailey

Editorial Note: IBM Fluid Query 1.7 became available in May, 2016. You can read about features in release 1.6 here, but we also recommend reading the release 1.7 blog here.

The IBM PureData Systems for Analytics team has assembled a value-add set of enhancements over current software versions of Netezza Platform Software (NPS), INZA software and Fluid Query. We have enhanced  integration, security, real-time analytics for System z and usability features with our latest software suite arriving on Fix Central today.

There will be something here for everyone, whether you are looking to integrate your PureData System (Netezza) into a Logical Data Warehouse, improve security, gain more leverage with DB2 Analytics Accelerator for z/OS, or simply improve your day-to-day experience. This post covers the IBM Fluid Query 1.6 technology.  Refer to my NPS and INZA post (link) for more information on the enhancements that are now available in these other areas.

Integrating with the Logical Data Warehouse: Fluid Query overview

Are you struggling with building out your data reservoir, lake or lagoon? Feeling stuck in a swamp? Or, are you surfing effortlessly through an organized Logical Data Warehouse (LDW)?

Fluid Query offers a nice baseline of capability to get your PureData footprint plugged into your broader data environment or tethered directly to your IBM BigInsights Apache Hadoop distribution. Opening access across your broader ecosystem of on-premise, cloud, commodity hardware and Hadoop platforms gets you ever closer to capturing value throughout “systems of engagement” and “systems of record” so you can reveal new insights across the enterprise.

Now is the time to be fluid in your business, whether it is ease of data integration, access to key data for discovery/exploration, monetizing data, or sizing fit-for-purpose stores for different data types.  IBM Fluid Query opens these conversations and offers some valuable flexibility to connect the PureData System with other PureData Systems, Hadoop, DB2, Oracle and virtually any structured data source that supports JDBC drivers.

The value of content and the ability to tap into new insights is a must have to compete in any market. Fluid Query allows you to provision data for better use by application developers, data scientists and business users. We provide the tools to build the capability to enable any user group.

fluid query connectors

What’s new in Fluid Query 1.6?

Fluid Query was released this year and is in its third “agile” release of the year. As part of NPS software, it is available at no charge to existing PureData clients, and you will find information on how to access Fluid Query 1.6 below.

This capability enables you to query more data for deeper analytics from PureData. For example, you can query data in the PureData System together with:

  • Data in IBM BigInsights or other Hadoop implementations
  • Relational data stores (DB2, 3rd party and open source databases like Postgres, MySQL, etc.)
  • Multi-generational PureData Systems for Analytics systems (“Twin Fin”, “Striper”, “Mako”)

The following is a summary of some new features in the release that all help to support your needs for insights across a range of data types and stores:

  • Generic connector for access to structured data stores that support JDBC
    This generic connector enables you to select the database of choice. Database servers and engines like Teradata, SQL Server, Informix, MemSQL and MAPR can now be tapped for insight. We’ve also provided a capability to handle any data type mismatches between differing source/target systems.
  • Support for compressed read from Big SQL on IBM BigInsights
    Now using the Big SQL capability in IBM BigInsights, you are able to read compressed data in Hadoop file systems such as Big Insights, Cloudera and Hortonworks. This adds increased flexibility and efficiency in storage, data protection and access.
  • Ability to import databases to Hadoop and append to tables in Hadoop
    New capabilities now enable you to import databases to Hadoop, as well as append data in existing tables in Hadoop. One use case for this is backing up historical data to a queryable archive to help manage capacity on the data warehouse. This may include incremental backups, for example from a specific date for speed and efficiency.
  • Support for the lastest Hadoop distributions
    Fluid Query v. 1.6 now supports the latest Hadoop distributions, including BigInsights 4.1, Hortonworks 2.5 and Cloudera 5.4.5. For Netezza software, support is now available for NPS 7.2.1 and INZA 3.2.1.

Fluid Query 1.6 can be easily downloaded from IBM Support Fix Central. I encourage you to refer to my “Getting Started” post that was written for Fluid Query 1.5 for additional tips and instructions. Note that this link is for existing PureData clients. Refer to the section below if you are not a current client.

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to helping our partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For more information,  NPS 7.2.1 and INZA 3.2.1 software, refer to my post.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

Performance – Getting There and Staying There with PureData System for Analytics

by David Birmingham, Brightlight Business Analytics, A division of Sirius Computer Solutions and IBM Champion

Many years ago in a cartoon dialogue, Dilbert’s boss expressed concern for the theft of their desktop computers, but Dilbert assured him, to his boss’ satisfaction, that if he loaded them with data they would be too heavy to move. Hold that thought.

Co location: Getting durable performance from queries

Many shops will migrate to a new PureData System for Analytics appliance, Powered by Netezza Technology, simply by copying old data structures into the new data warehouse appliance. They then point their BI tools at it and voila, a 10x performance boost just for moving the data. Life is good.

The shop moves on by hooking up the ETL tools, backups and other infrastructure, not noticing that queries that ran in 5 seconds the week before, now run in 5.1 seconds. As the weeks wear on, 5.1 seconds become 6, then 7, then 10 seconds. Nobody is really watching, because 10 seconds is a phenomenal turnaround compared to their prior system’s 10-minute turnaround.

But six months to a year down the line, when the query takes 30 seconds or longer to run, someone may raise a flag of concern. By this time, we’ve built many new applications on these data structures. Far-and-away more data has been added to its storage. In true Dilbert-esque terms, loading more data makes the system go slower.

PureData has many layers of high-performance hardware, each one more powerful than the one above it. Adhering to this leverage over time helps maintain durable performance.

The better part about a PureData machine is that it has the power to address this by adhering to a few simple rules. When simply migrating point-to-point onto a PureData appliance, we’re likely not taking advantage of the core power-centers in Netezza technology. The point-to-point migration starts out in first-gear and never shifts up to access more power. That is, PureData has many layers of high-performance hardware, each one more powerful than the one above it. Adhering to this leverage over time helps maintain durable performance. The system may eventually need an upgrade for storage reasons, but not for performance reasons.

PureData is a physical machine with data stored on its physical “real estate”, but unlike buying a house with “location-location-location!” we want “co-location-co-location-co-location!” Two flavors of data co-location exist: zone maps and data distribution. The use of these (or lack thereof) either enable or constrain performance. These factors are physical, because performance is in the physics. It’s not enough to migrate or maintain a logical representation of the data. Physical trumps logical.

Zone maps, a powerful form of co-location in PureData

The most powerful form of co-location is zone maps, optimized through the Organize-On and Groom functions. Think of transaction_date as an Organize-On optimization key. The objective is to regroup the physical records so that those with like-valued keys are co-located on as few disk pages as possible. Groom will do this for us. Now when a query is issued against the table, filtering the transaction_date on a date value or date range filter, this query will be applied to the zone maps to derive the known physical disk locations and exclude all others. This is Netezza’s principle of using the query to tell it “where-not-to-look”.

The additional caveat is that the physical co-location of records by Organize-On keys is only valuable if they are actually used in the query. They radically reduce data reads, for example from 5 thousand pages down to 5 pages to get the same information. That’s a 1000x boost! The zone maps, enabled by Organize-On and Groom, are what achieve these dramatic performance boosts. If we do not use them, then queries will initiate a full table-scan which naturally takes more time.

The reason why this is so important is that disk-read is the number one penalty of the query, with no close second. A PureData System N200x or N3001 can read over 1100 pages per second on a given data slice. So if the query scans 5000 pages for each, it’s easily a 4-second query. But it won’t stay a 4-second query. As the data grows from 5000 pages to 10,000 pages, it will become a 10-second query. If the query leverages the zone maps and reduces it consistently to say, 100 pages per query, the query will achieve a sub-second duration and remain there for the life of the solution.

Does this sound like too much physical detail to know for certain what to do? That’s why the Organize-On and Groom functions make it easy. Just use the Query History’s column access statistics, locate the largest tables and find the most-often-accessed columns in where-clause filters (just don’t Organize-On join-only columns or distribution keys!). Add them to the Organize-On, Groom the table and watch this single action boost the most common queries into the stratosphere.

Data Distribution, co-location through “data slices”

Data distribution is another form of co-location. On a PureData system, every table is automatically divided across disks, each representing a “data slice”. Basically when a distribution key (e.g. Customer_ID) is used, the machine will hash the key values to guarantee that records with the same key value will always arrive on the same data slice. If several tables are distributed on the same key value, their like-keyed records will also be co-located on the same data slice. This means joining on those keys will initiate a parallel join, or what is called a co-located read.

Another of the most powerful aspects of Netezza technology is the ability to process data in parallel. Using the same distribution key to make an intermediate table, an insert-select styled query will perform a co-located read and a co-located write, effectively performing the operation in massively parallel form and at very fast speeds. Netezza technology can eclipse a mainframe in both its processing speed and ability to move and position large quantities of data for immediate consumption.

A few tweaks to tables and queries however, can yield a 100x or 1000x boost…

The caveat of data distribution is that a good distribution model can preserve capacity for the long-term. A distribution model that does not leverage co-located joining will chew-up the machine’s more limited resources such as memory and the inter-process network fabric. If we have enough of these queries running simultaneously, the degradation becomes extremely pronounced. A few tweaks to tables and queries however, can yield a 100x or 1000x boost; and without them the solution is using 10x or 100x more machine capacity than necessary. This is why some machines appear very stressed even though they are doing and storing so little.

Accessing the machine’s “deep metal”

Back to the notion of a “simple migration”. Does it sound like a simple point-to-point migration will leverage the power of the machine? Do the legacy queries use where-clause filters that can consistently invoke the zone maps? Are the tables configured to be heavily dependent upon indexes to support performance? If so, then the initial solution will be in first-gear.

But wait, maybe the migration happened a year or so ago and now the machine is “under stress” for no apparent reason. Where did all the capacity go? It’s still waiting to be used, in the deep-metal of the machine, the metal that the migrated solution doesn’t regard. It’s easy to fix that and voila, all this “extra” capacity seemingly appears from nowhere, like magic! It was always there. The solution was ignoring it and grinding the engines in first gear.

Enable business users to explore deep data detail

When Stephen Spielberg made Jurassic Park, he mentioned that the first dinosaur scene with the giant Brachiosaurus required over a hundred hours of film and CGI crunched into fifteen seconds of movie magic.

This represents a typical analytic flow model, where tons of data points are summarized into smaller form for fast consumption by business analysts. PureData System changes this because it is fast and easy to expose deep detail to users. Business analysts like to have access to the detail of deep data because summary structures will throw away useful details in an effort to boost performance on other systems.

The performance is built-in to the machine. It’s an appliance after all.

Architects and developers alike can see how the “Co-location, co-location, co-location!” is easy to configure and maintain, offering a durable performance experience that is also adaptable as business needs change over time. Getting there and staying there doesn’t require a high-wall of engineering activities or a gang of administrators  on roller-skates to keep it running smoothly. The performance is built-in to the machine. It’s an appliance after all.

About David,

David Birmingham, Brighlight, Sirius Computing Solutions David is a Senior Solutions Architect with Brightlight Consulting, a division of Sirius Computer Solutions, and an IBM Champion since 2011. He has over 30 years of extensive experience across the entire BI/DW lifecycle. David is one of the world’s top experts in PureData for Analytics (Netezza) – is the author of Netezza Underground and Netezza Transformation (both on Amazon.com) and various essays on IBM Developerworks’ Netezza Underground Blog. He is also a five-year IBM Champion, a designation that recognizes the contributions of IBM customers and partners.  Catch David each year at the Sunday IBM Insight Enzee Universe for new insights on best practices and solutions with the machine.

Join the Live Chat August 4th: Gain deeper insights using more data with IBM Fluid Query 1.5

by Cindy Russell

Update: This live chat is now available as a transcript for you to browse.  Click here to read. 

Join us online for a live chat with the experts on the new Fluid Query 1.5 functionality. This new technology is part of the Netezza Platform Software (NPS) and it lets you query PureData Systems for Analytics stores together with Hadoop, Spark, and relational stores such as DB2, dashDB, Oracle and even other PureData warehouses to enable deeper insights. It is designed to bring the query to the data instead of moving around massive volumes of data just to be analyzed.

The chat will be held from 11AM to 12:00 PM ET on August 4th. This is a great way to ask questions to the technical and marketing teams in an informal setting such as a live chat.

Even though it is informal, we have organized some discussion questions for efficiency, and we will cover as many of these questions as we can within the time frame.

  • What is Fluid Query?
  • What are the design points behind Fluid Query?
  • What’s new for Fluid Query version 1.5?
  • What data warehousing and analytics challenges does the new Fluid Query for PureData address?
  • What is the data movement capability in Fluid Query?
  • How does Fluid Query fit within the logical data warehouse concept?
  • How are customers using it?
  • What is the IBM portfolio of products for querying data across different stores?
  • Open: what questions do you have for our experts?
  • Where can you learn more and how can you stay in touch with this topic?

How to join

Joining the chat is simple. Go now to this link: bit.ly/FQ_chat. Click the add to calendar button. On August 4th at 11AM ET, sign onto your Twitter account, then enter this link in your browser: https://www.crowdchat.net/fluidquery. Click sign in and select Twitter. That’s it! Talk to you then.

In the meanwhile, you can learn more about the new Fluid Query 1.5 in Rich Hughes blog.

Learn more resources

These resources will be mentioned in the chat and can be used as a handy “quick reference” guide to learn more about Fluid Query:

About Cindy,

Cindy is the marketing programs and social media manager IBM Data Warehousing products including the BLU Acceleration in-memory computing component of DB2 software, and PureData System for Analytics. Cindy’s key focus has been on community engagement programs such as social media, collaboration hubs, Tech Talks, workshops and similar programs.  Prior to data warehousing, Cindy has worked on marketing programs for IBM Rational development tools, technology services, and other technical products.