Three session guides get you started with data warehousing at IBM Insight at World of Watson

Join us October 24 to 27, 2016 in Las Vegas!

by Cindy Russell, IBM Data Warehouse marketing

IBM Insight has been the premiere data management and analytics event for IBM analytics technologies, and 2016 is no exception.  This year, IBM Insight is being hosted along with World of Watson and runs from October 24 to 27, 2016 at the Mandalay Bay in Las Vegas, Nevada.  It includes 1,500 sessions across a range of technologies and features keynotes by IBM President and CEO, Ginni Rometty; Senior Vice President of IBM Analytics, Bob Picciano; and other IBM Analytics and industry leaders.  Every year, we include a little fun as well, and this year the band is Imagine Dragons.

IBM data warehousing sessions will be available across the event as well as in the PureData System for Analytics Enzee Universe (Sunday, October 23).  Below are product-specific quick reference guides that enable you to see at a glance key sessions and activities, then plan your schedule.  Print these guides and take them with you or put the links to them on your phone for reference during the conference.

This year, the Expo floor is called the Cognitive Concourse, and we are located in the Monetizing Data section, Cognitive Cuisine experience area.  We’ll take you on a tour across our data warehousing products and will have some fun as we do it, so please stop by.  There is also a demo room where you can see live demos and engage with our technical experts, as well as a series of hands-on labs that let you experience our products directly.

The IBM Insight at World of Watson main web page is located here.  You can register and then use the agenda builder to create your personalized schedule.

IBM PureData System for Analytics session reference guide

Please find the session quick reference guide for PureData System for Analytics here: ibm.biz/wow_enzee

Enzee Universe is a full day of dedicated PureData System for Analytics / Netezza sessions that is held on Sunday, October 23, 2016.  To register for Enzee Universe, select sessions 3459 and 3461 in the agenda builder tool.  This event is open to any full conference pass holder.

During the regular conference, there are also more than 35 PureData, Netezza, IBM DB2 Analytics Accelerator for z/OS (IDAA) technical sessions across all the conference tracks, as well as hands on labs.  There are several session being presented by IBM clients so you can see how they put PureData System for Analytics to use.  Click the link above to see the details.

IBM dashDB Family session reference guide

Please find the session quick reference guide for the dashDB family here: ibm.biz/wow_dashDB

There are a more than 40 sessions for dashDB, including a “Meet the Family” session that will help you become familiar with new products in this family of modern data management and data warehousing tools.  There is also a “Birds of a Feather” panel discussion on Hybrid Data Warehousing, and one that describes some key use cases for dashDB.  And, you can also see a demo, take in a short theatre session or try out a hands-on lab.

IBM BigInsights, Hadoop and Spark session reference guide

Please find the session quick reference guide for BigInsights, Hadoop and Spark topics here: ibm.biz/wow_biginsights

There are more than 65 sessions related to IBM BigInsights, Hadoop and Spark, with several hands on labs and theatre sessions. There is everything from an Introduction to Data Science to Using Spark for Customer Intelligence Analytics to hybrid cloud data lakes to client stories of how they use these technologies.

Overall, it is an exciting time to be in the data warehousing and analytics space.  This conference represents a great opportunity to build depth on IBM products you already use, learn new data warehousing products, and look across IBM to learn completely new ways to employ analytics—from Watson to Internet of Things and much more.  I hope to see you there.

IBM dashDB Local opens its preview for data warehousing on private clouds and more!

by Mitesh Shah

Just like in the story of Goldilocks … you may be looking for modern data warehousing that is “just right.”  Your IT strategy may include cloud and you may like the simplicity and scalability benefits of cloud … yet some data and applications may need to stay on-premises for a variety of reasons.  Traditional data warehouses provide essential analytics, yet they may not be right for new types of analytics, data born on the cloud, or simply cannot contain a growing workload of new requests.

IBM dashDB Local is an open preview technology that is designed to give you “just right” cloud-like simplicity and flexibility.  It delivers a configured data warehouse in a Docker container that you can deploy wherever you need it as long Docker is supported on that infrastructure. Often, this is a private cloud, virtual private cloud (AWS/Azure), or other software-defined infrastructure. You gain management simplicity and have an environment that you can control more directly.

DownloadFromDocker
Download and install dashDB Local quickly and simply via Docker container technology.

dashDB Local may be the right choice when you have complex applications that must be readied for cloud, have SLAs or regulations that require data or applications to stay on premises, or you need to address new analytics requests very quickly with easy scale in and out capabilities.

dashDB Local complements the dashDB data warehouse as a service offering that is delivered via IBM Bluemix. Because both products are based on a common database technology, you can move workloads across these editions without costly and complex application change!   This is one example of how we define a hybrid data warehouse and how it can help improve your flexibility over time as your needs evolve.

Since dashDB Local began its closed preview in February of 2016, the team has rallied to bring in a comprehensive set of data warehousing features to this edition of dashDB. We have been listening to the encouraging feedback from our initial preview participants, and as a result, we now have a solution that is open for you to test!

So what are you waiting for?

It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes.

Ours early adopters have been fascinated by the power and ease of deployment for the Docker container.  It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes. One client said that dashDB Local is as easy to deploy and manage as a mobile app! We are thrilled by this type of feedback!

Workload monitoring in dashDB Local delivers elasticity to scale out or in.
Workload monitoring in dashDB Local delivers elasticity to scale out or in.

The open preview (v. 0.5.0) offers extreme scale out and scale in capabilities. Yes, you heard me right. Scale-in provides the elasticity to not tie up your valuable resources beyond the peak workloads. This maximizes return on investment for your variable reporting and analytics solutions.  The open preview will help you test drive the Netezza compatibility (IBM PureData System for Analytics) within dashDB technology, as well as analytics support using RStudio. Automated High Availability is another attractive feature that is provided out-of-the box for you to see and test.

Preview participants have been eager to test drive query performance. One participant says, “We are very impressed with the performance, and within no time we have grown our dataset of 40 million to 200 million records (a few TBs) and the analytics test queries run effortless.” Our participants are leveraging their data center infrastructure whether it’s bare metal or virtualized (VMs) to get started and some have installed it on their laptops to quickly gain an understanding of this preview.

Register for dashDB Local previewFind out how it can be “just right” for you!  Go here to give it a try and get ready to be wowed.  We value and need your feedback to help us prioritize features that are important to your business.  All the best and don’t hesitate to drop me a line to let me know what you think!


About Mitesh,

MiteshMitesh Shah is the product manager for the new dashDB Local data warehousing solution as a software-defined environment (SDE) that can be used on private clouds and platforms that support Docker container technology. He has broad experience around various facets of software development revolving around relational databases and data warehousing technologies.  Throughout his career, Mitesh has enjoyed a focus on helping clients address their data management and solution architecture needs.

IBM Fluid Query 1.7 is Here!

by Doug Dailey

IBM Fluid Query offers a wide range of capabilities to help your business adapt to a hybrid data architecture and more importantly it helps you bridge across “data silos” for deeper insights that leverage more data.   Fluid Query is a standard entitlement included with the Netezza Platform Software suite for PureData for Analytics (formerly Netezza). Fluid Query release 1.7 is now available, and you can learn more about its features below.

Why should you consider Fluid Query?

It offers many possible uses for solving business problems in your business. Here are a few ideas:
• Discover and explore “Day Zero” data landing in your Hadoop environment
• Query data from multiple cross-enterprise repositories to understand relationships
• Access structured data from common sources like Oracle, SQL Server, MySQL, and PostgreSQL
• Query historical data on Hadoop via Hive, BigInsights Big SQL or Impala
• Derive relationships between data residing on Hadoop, the cloud and on-premises
• Offload colder data from PureData System for Analytics to Hadoop to free capacity
• Drive business continuity through low fidelity disaster recovery solution on Hadoop
• Backup your database or a subset of data to Hadoop in an immutable format
• Incrementally feed analytics side-cars residing on Hadoop with dimensional data

By far, the most prominent use for Fluid Query for a data warehouse administrator is that of warehouse augmentation, capacity relief and replicating analytics side-cars for analysts and scientists.

New: Hadoop connector support for Hadoop file formats to increase flexibility

IBM Fluid Query 1.7 ushers in greater flexibility for Hadoop users with support for popular file formats typically used with HDFS.Fluid query 1.7 connector picture These include popular data storage formats like AVRO, Parquet, ORC and RC that are often used to manage bigdata in a Hadoop environment.

Choosing the best format and compression mode can result in drastic differences in performance and storage on disk. A file format that doesn’t support flexible schema evolution can result in a processing penalty when making simple changes to a table. Let’s just  say that if you live in the Hadoop domain, you know exactly what I am speaking of. For instance, if you want to use AVRO, do your tools have readers and writers that are compatible? If you are using IMPALA, do you know that it doesn’t support ORC, or that Hortonworks and Hive-Stinger don’t play well with Parquet? Double check your needs and tool sets before diving into these popular format types.

By providing support for these popular formats,  Fluid Query allows you to import, store, and access this data through local tools and utilities on HDFS. But here is where it gets interesting in Fluid Query 1.7: you can also query data in these formats through the Hadoop connector provided with IBM Fluid Query, without any change to your SQL!

New: Robust connector templates

In addition, Fluid Query 1.7 now makes available a more robust set of connector templates that are designed to help you jump start use of Fluid Query. You may recall we provided support for a generic connector in our prior release that allows you to configure and connect to any structured data store via JDBC. We are offering pre-defined templates with the 1.7 release so you can get up and running more quickly. In cases where there are differences in user data type mapping, we also provide mapping files to simplify access.  If you have your own favorite database, you can use our generic connector, along with any of the provided templates as a basis for building a new connector for your specific needs. There are templates for Oracle, Teradata, SQL Server, MySQL, PostgreSQL, Informix, and MapR for Hive.

Again, the primary focus for Fluid Query is to deliver open data access across your ecosystem. Whether the data resides on disk, in-memory, in the Cloud or on Hadoop, we strive to enable your business to be open for data. We recognize that you are up against significant challenges in meeting demands of the business and marketplace, with one of the top priorities around access and federation.

New: Data movement advances

Moving data is not the best choice. Businesses spend quite a bit of effort ingesting data, staging the data, scrubbing, prepping and scoring the data for consumption for business users. This is costly process. As we move closer and closer to virtualization, the goal is to move the smallest amount of data possible, while you access and query only the data you need. So not only is access paramount, but your knowledge of the data in your environment is crucial to efficiently using it.

Fluid Query does offer data movement capability through what we call Fast Data Movement. Focusing on the pipe between PDA and Hadoop, we offer a high speed transfer tool that allows you to transfer data between these two environments very efficiently and securely. You have control over the security, compression, format and where clause (DB, table, filtered data). A key benefit is our ability to transfer data in our proprietary binary format. This enables orders of magnitude performance over Sqoop, when you do have to move data.

Fluid Query 1.7 also offers some additional benefits:
• Kerberos support for our generic database connector
• Support for BigInsights Big SQL during import (automatically synchronizes Hive and Big SQL on import)
• Varchar and String mapping improvements
• Import of nz.fq.table parameter now supports a combination of multiple schemas and tables
• Improved date handling
• Improved validation for NPS and Hadoop environment (connectors and import/export)
• Support for BigInsights 4.1 and Cloudera 5.5.1
• A new Best Practices User Guide, plus two new Tutorials

You can download this from IBM’s Fix Central or the Netezza Developer’s Network for use with the Netezza Emulator through our non-warranted software.

Picture1

Take a test drive today!

About Doug,
Doug Daily
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

What’s new: Netezza Platform Software and INZA software for PureData Systems for Analytics

by Doug Dailey

The IBM PureData Systems for Analytics team has just released a new set of enhancements over current software versions of Netezza Platform Software (NPS), INZA and IBM Fluid Query. These include enhanced  integration, security, real-time analytics for z Systems and usability features, all included in our latest software suite that has been posted on Fix Central.

There will be something here for everyone, whether you are looking to increase security, gain more leverage with DB2 Analytics Accelerator for z/OS*, improve your day-to-day experience or integrate PureData System (Netezza technology) into a Logical Data Warehouse. This post covers the new capabilities and enhancements in NPS 7.2.1 and INZA 3.2.1 software.  Refer to my IBM Fluid Query 1.6 post  for more information.

Strengthening end-to-end security for PureData and DB2 Analytics Accelerator for z/OS

With the advent of self-encrypted disk drives in our N3001 model, we laid the groundwork for securing data at rest. Not only do you have state of the art disk encryption keys by Seagate and Hitachi at work from a hardware standpoint, but you also have added peace of mind through a second tier of security that protects host drives and those drives associated with the Snippet Processing Unit. A local keystore with flexible CLI on the N3001 system enabled you to protect your most valuable assets. This release adds support for KMIP, which now allows 3rd party and IBM targeted key management software to backup, store and manage host and SPU keys on your system. Additional attention was paid to hardening the host systems for the DB2 Analytics Accelerator powered by PureData.
security

Speaking of DB2 Analytics Accelerator, this release of NPS provides key functionality recently added to DB2 Analytics Accelerator in version 5.1 which incorporates Netezza Analytics as a core component to help accelerate the use of predictive analytics applications (e.g., SPSS) such as data mining and in-database modeling. By extending support for the mainframe EBCDIC code to INZA software with support for new sets of procedures, you can run real-time analytics on DB2 Analytics Accelerator and establish work areas for data scientists. In-database transformation supports IBM DataStage balanced optimization and ETL/ELT consolidation processing.

This optimized, integrated appliance has been hardened to not only support self-encrypting drives available through PureData Systems for Analytics N3001 systems, but it now accounts for encryption of data-in-motion by encrypting network with the mainframe, FIPS-enabled RHEL, LFTP and secure VPN. Updated performance around continuous load operations better supports enterprise clients running highly concurrent trickle-feed loads under heavy processing of simultaneous mixed workloads to ensure faster data synchronization and TTV for insights. EBCDIC support for Netezza Analytics provides the ability to execute sophisticated in-database algorithms on DB2 Analytics Accelerator that allow micro-analytics across transactional, historical and real-time data.  NPS software now supports the following algorithms: Decision Tree, Regression Tree, Naïve Bayes, K-means Clustering and Two-Step Clustering.

PureData IDAA images

Making life easier through an improved User Experience

If these aren’t enough, we also targeted some areas to improve overall user experience by providing tooling and support that will make life easier for DBAs, system administrators and application developers:

  • Improved throughput and consistency for trickle-feed and highly concurrent smaller load operations.
  • nzload enhancements reduce TTV and shorten ETL activities; recordDelimiter, newline, timestamp, merge, datedelim, timedelim, and monitor.
  • New merge capability improves RI and positions Oracle migrations to PureData System.
  • nzSQL for Windows greatly improves usability for managing PureData System from the Windows desktop environment.
  • nzSQL support for external remote tables allows users to run load/unload operations from Linux clients to/from a remote file rather than host-only loads.
  • PureData will natively support Microsoft .NET and open a new range of possibilities for partner solutions.
  • JDBC support for JDK 1.7. in both NPS and INZA software ensures support for latest Hadoop distributions and also for Fluid Query.
  • New 64-bit BNR connectors are now certified for the latest versions of Tivoli, Netbackup and EMC.
  • PureData improves uptime by reducing requirements to stop and start NPS when user connections are exceeded.
  • ODBC support is now available for comments through DSN, odbc.ini and connection string (single, multi, inline, nested comments), as well as support for the LIMIT clause.

SQL enhancements

We’ve incorporated support for newer Client Kit OS versions and platforms with this release. Support for Windows 8, Windows 2012 R2, Ubuntu, and a completely new Power PC RHEL client for Little Endian. Support for Power on Little Endian positions PureData Systems for IBM BigInsights and the IBM Open Platform. We have also included additional SQL support for:

  • Support for DROP TABLE IF EXISTS
  • CREATE TABLE IF NOT EXISTS
  • Single slice support for JOINS with multi-column distribution keys
  • SQL push-down of NULL aware
  • New table-based Zone Maps

Client download of these new releases

NPS 7.2.1 and INXA 3.2.1 software is available at no charge to existing PureData clients. It can be easily downloaded from IBM Support Fix Central. Note that business partners and prospective clients can download and explore these new releases on Netezza Developer Network (additional information below).

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to business partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For additional information on Fluid Query 1.6, refer to my what’s new post.

* DB2 Analytics Accelerator for z/OS is a high-performance appliance that integrates the IBM z Systems infrastructure with IBM PureData™ for Analytics, powered IBM Netezza technology. The solution transforms your mainframe into a highly-efficient transactional and analytics processing environment. This enables clients to exploit z Systems data where it originates.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.