Three session guides get you started with data warehousing at IBM Insight at World of Watson

Join us October 24 to 27, 2016 in Las Vegas!

by Cindy Russell, IBM Data Warehouse marketing

IBM Insight has been the premiere data management and analytics event for IBM analytics technologies, and 2016 is no exception.  This year, IBM Insight is being hosted along with World of Watson and runs from October 24 to 27, 2016 at the Mandalay Bay in Las Vegas, Nevada.  It includes 1,500 sessions across a range of technologies and features keynotes by IBM President and CEO, Ginni Rometty; Senior Vice President of IBM Analytics, Bob Picciano; and other IBM Analytics and industry leaders.  Every year, we include a little fun as well, and this year the band is Imagine Dragons.

IBM data warehousing sessions will be available across the event as well as in the PureData System for Analytics Enzee Universe (Sunday, October 23).  Below are product-specific quick reference guides that enable you to see at a glance key sessions and activities, then plan your schedule.  Print these guides and take them with you or put the links to them on your phone for reference during the conference.

This year, the Expo floor is called the Cognitive Concourse, and we are located in the Monetizing Data section, Cognitive Cuisine experience area.  We’ll take you on a tour across our data warehousing products and will have some fun as we do it, so please stop by.  There is also a demo room where you can see live demos and engage with our technical experts, as well as a series of hands-on labs that let you experience our products directly.

The IBM Insight at World of Watson main web page is located here.  You can register and then use the agenda builder to create your personalized schedule.

IBM PureData System for Analytics session reference guide

Please find the session quick reference guide for PureData System for Analytics here:

Enzee Universe is a full day of dedicated PureData System for Analytics / Netezza sessions that is held on Sunday, October 23, 2016.  To register for Enzee Universe, select sessions 3459 and 3461 in the agenda builder tool.  This event is open to any full conference pass holder.

During the regular conference, there are also more than 35 PureData, Netezza, IBM DB2 Analytics Accelerator for z/OS (IDAA) technical sessions across all the conference tracks, as well as hands on labs.  There are several session being presented by IBM clients so you can see how they put PureData System for Analytics to use.  Click the link above to see the details.

IBM dashDB Family session reference guide

Please find the session quick reference guide for the dashDB family here:

There are a more than 40 sessions for dashDB, including a “Meet the Family” session that will help you become familiar with new products in this family of modern data management and data warehousing tools.  There is also a “Birds of a Feather” panel discussion on Hybrid Data Warehousing, and one that describes some key use cases for dashDB.  And, you can also see a demo, take in a short theatre session or try out a hands-on lab.

IBM BigInsights, Hadoop and Spark session reference guide

Please find the session quick reference guide for BigInsights, Hadoop and Spark topics here:

There are more than 65 sessions related to IBM BigInsights, Hadoop and Spark, with several hands on labs and theatre sessions. There is everything from an Introduction to Data Science to Using Spark for Customer Intelligence Analytics to hybrid cloud data lakes to client stories of how they use these technologies.

Overall, it is an exciting time to be in the data warehousing and analytics space.  This conference represents a great opportunity to build depth on IBM products you already use, learn new data warehousing products, and look across IBM to learn completely new ways to employ analytics—from Watson to Internet of Things and much more.  I hope to see you there.


IBM BigInsights version 4.2 is here!

Brings Hadoop, Spark and SQL into one flexible, open analytics platform

by Andrea Braida

Today, we are pleased to announce that IBM BigInsights® 4.2 is generally available. BigInsights 4.2 is built on IBM Open Platform (IOP), IBM’s big data platform with Apache Spark and Apache Hadoop. IOP offers the ideal combination of Apache components to support big data applications. The BigInsights 4.2 release puts the full range of analytics for Hadoop, Spark and SQL into the hands of advanced analytics and data science teams on a single platform.

IBM has deep Hadoop expertise, and in the last year, has moved into a very strong Apache Spark leadership position as well. IBM is integrating and embedding Spark across its analytics portfolio, which means that customers get Spark in any way they want it. No one else in the market is doing this today. (BigInsights 4.2 also includes comprehensive machine language support – Spark, SystemML and integration with H2O.)

If a recommended Hadoop distribution is something you’re interested in, the most significant release features, including Spark integration, are summarized for you below.

What’s new in BigInsights 4.2?

BigInsights 4.2 introduces a range of new capabilities that make it more open, flexible and powerful:

Integration with Apache Spark 1.6.1

Access the processing and analytics power of Spark, which includes dramatically speeding up batch and ETL processing times with the Spark Core, near real-time analytics with Spark Streaming, built-in machine learning libraries which are highly extensible using Spark MLlib, querying of unstructured data and more value from free-form text analytics with Spark SQL, and graph computation/graph analytics with Spark GraphX.

IBM Big SQL enhancements for RDBMS offload and consolidation.
Big SQL now understands SQL dialects from other vendors and products, such as Oracle, IBM DB2® and IBM Netezza®, making it the ultimate platform for RDBMS offload and consolidation. It is faster and easier to offload old data from existing enterprise data warehouses or data marts to free up capacity while preserving most of the familiar SQL from those platforms. BigSQL is also the only SQL engine for Hadoop that exploits Hive, HBase, and Spark concurrently for best in class analytic capabilities.

New Apache components and currency updates to existing components
BigInsights 4.2 now includes Apache Ranger, Apache Phoenix and Apache Titan. BigInsights is currently the only Hadoop distribution with Graph Database. Notable currency updates include updates to Ambari, Kafka, and SOLR.

ODPI Runtime Certification
With V4.2, IOP is among the first Hadoop platforms to comply with the Open Data Platform (ODPi) Runtime Certification. This means it is easier for independent software vendors to adopt IOP as a platform, and ensures platform openness for customers.

Introducing IBM Big Replicate
IBM Big Replicate provides continuous availability and data consistency via a patented active-transactional replication technology which also provides streaming backup, hybrid cloud, and burst-to-cloud. This is an optimized data replication capability for uninterrupted migration between different distributions to IBM, cloud to on-prem, and vice versa.

Why should you consider BigInsights 4.2?

Some key standout features for BigInsights 4.2 are BigSQL performance imporvements, deeper analytics with Spark and Graph Database, and a more open and secure platform.

BigSQL performance improvements

BigSQL is the SQL query engine in BigInsights.  New performance improvements make it super fast, and super easy to install and manage. These enhancements to BigSQL 4.2 result in significant performance improvements:

  • Built-in components improve performance with less tuning (auto-analyze)
  • Improved memory management and operational stability
  • High performance transactional support is now included
  • Apache Phoenix provides easier access to Hbase with a SQL interface
  • In Technology Preview, in-memory technology (BLU Acceleration) on Big SQL head nodes is now available for faster processing

These enhancements make BigInsights an ideal platform for RDBMS off-load and consolidation, as well as a hybrid engine that can help you exploit fit-for-purpose Hadoop subsystems.

Deeper and improved analytics with Spark and Graph Database

  • Easier and richer text analytics
  • New AQL Editor makes it easier to migrate existing AQL to V4.2
  • Web-based, drag-and-drop development
  • Powerful, expressive, AQL language to get more done, with less work
  • New run-on-cluster with Spark
  • Pre-built extractors: Named Entity, Financial, Sentiment, Machine Data
  • Graph Database – Titan
  • IOP is the first Hadoop distribution to include a graph database in its distribution

More open and more secure

  • For security, BigInsights 4.2 is compliant with industry standards, and includes Apache Ranger which provides centralized security management and auditing of users and the REST interface. It supports HDFS, YARN, Hive, HBase, and Kafka, allowing users to focus more time on analyzing data versus worrying about security.
  • BigInsights now enables easy product integration with ODPI Runtime Certification. With V4.2, IOP is among the first Hadoop platforms to comply with the Open Data Platform (ODPi) Runtime Certification. This means it is easier for independent software vendors  to adopt IOP as a platform, and it ensures platform openness for clients.

The BigInsights’ core –  IBM Open Platform (IOP) – was designed with a focus on analytics, operational excellence, and security empowerment, and is certified by the Open Data Platform Initiative (ODPi).

Get started free

BigInsights is available on-premises, on-cloud, and is integrated with other systems in use today, with enterprise-class support available. (Please note that BigQuality, BigIntegrate, Phoenix, Ranger, Solr, and Titan are available on BigInsights on-premises only, and are planned for the on-cloud offering.*)

BigInsights is also integrated with a broad and open ecosystem of data and analytics tools, allowing for a true hybrid architecture. BigInsights on Cloud was recently ranked as a leader in the Hadoop Cloud services market by Forrester, which I’ll share more about in my next blog.



Get started with a free version of the BigInsights core, IBM Open Platform (IOP). Click here.

And for more information about the 4.2 release, please visit our release overview or refer to the Big Replicate overview.  Or visit the Hadoop solutions page.

About Andrea,

andrea braida_croppedAndrea Braida is a Portfolio Marketing Manager at IBM for Big Data Analytics and Data Science offerings. A former start-up founder, she has extensive product management, product marketing, and data science marketing experience within both global technology giants and start-ups. Andrea is based in Seattle, Washington.

* The information contained in this presentation is provided for informational purposes only.

While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided “as is”, without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other documentation. Nothing contained in this presentation is intended to, or shall have the effect of: 1) Creating any warranty or representation from IBM (or its affiliates or its or their suppliers and/or licensors); or 2) altering the terms and conditions of the applicable license agreement governing the use of IBM software.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment.  The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multi-programming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed.  Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

The Logical Data Warehouse : Two Easy Pieces (DW+Hadoop)

By Dennis Duckworth,

In some of our recent blogs, we have described our Data Warehouse Point of View and our Zone Architecture for Big Data. We developed these from our experiences with our customers, seeing what worked (and what didn’t), to encourage those who are just starting out on their analytics journeys or those who are disappointed by the performance or rigidity of their existing data warehouse environments to at least consider the advantages of separating data (and corresponding analytics) into different zones based on the characteristics of both. We have been using the term Data Warehouse Modernization to describe the renovation of old traditional monolithic data warehouses (along with other data silos) into hybrid, integrated, or logical data warehouse models.

In a sort of modernization of our own, we have reexamined how we go to market with our data warehouse and data management products to see how we might make it easier for our customers to implement the best practices that we actively promote. With the recent release of our latest data warehouse appliance, the PureData System for Analytics (PDA) N3001 (codename Mako), we had the chance to make some changes. Now,  for example, included with every PDA appliance we ship (every configuration, from the smallest, the “Mako-mini” 2 server rack-mountable appliance, all the way up to our largest, our 8-rack system), we include license entitlements for other IBM software products we firmly believe can help our customers in creating a modern, flexible, high performance logical data warehouse environment. One of those entitlements is for IBM InfoSphere BigInsights for Hadoop.

Studies are proving out our opinion that the logical data warehouse is a critical contributor to analytic success for enterprises. In the recently released 2014 IBM Institute for Business Value analytics study, companies were analyzed and categorized by the extent and the effectiveness of analytics in them. Those in the top category, the “front runners”, use data to the highest benefit. They have been successful in “blending” their traditional business intelligence infrastructures with big data technologies to create agility and flexibility in the way they ingest, manage and use data. Quite interestingly, and consistent with our guidance in these blogs, almost all of the front runners (92 percent) have an integrated (or hybrid) data warehouse and, as part of that, they are 10 times than more likely than other organizations to have a big data landing platform. In practice, they have implemented what we have called zone architecture to allow them to collect and analyze a wider variety of data, empowering their employees to make full use of their traditional data and new types of data together.

DL 1

Our customers are also providing proof that data warehouse modernization works. How are these customers using BigInsights and these big data landing platforms? Many are creating what we have been calling data reservoirs. As you may recall from our blogs here and from the hundreds/thousands of other posts on the topic, Hadoop is finding a home in the enterprise as the preferred technology for data reservoirs. These are landing areas for all the data you think may be useful in your company, whether it is structured, unstructured, or semi-structured. Some more specific examples: One of our customers is using BigInsights in combination with the PureData System for Analytics to help it convert users of its free cloud service to customers for their paid service, using predictive analytics on user behavior (structured and unstructured data) to target them more accurately with offers. Another, a telco, is using BigInsights with PDA along with InfoSphere Streams to get a 360° view of its customers and to enable them to react in real-time to customer satisfaction issues. (The InfoSphere Streams entitlement with PDA will be the topic of a future blog.)

The BigInsights entitlement that comes with the N3001 PureData System for Analytics is for 5 virtual nodes which, by our calculations, gives you the ability to manage about 100TB of data. So this is not a useless little demo version – this license gives you the ability to create and use a full-blown Hadoop cluster with all of the advantages that BigInsights has to offer, things like Big SQL for SQL access to the data in BigInsights, Big Sheets (enables Excel like spreadsheet exploration of the data), text analytics accelerator, Big R (which allows you to explore, visualize, transform, and model big data using familiar R syntax), and a long list of other features and capabilities. You get all of this (and much more) with every N3001 PureData System for Analytics. With software entitlements like this, we allow you to practice what we preach: modernize your data management environment by putting data and the corresponding analytics on the proper platform.

About Dennis Duckworth

Dennis Duckworth, Program Director of Product Marketing for Data Management & Data Warehousing has been in the data game for quite a while, doing everything from Lisp programming in artificial intelligence to managing a sales territory for a database company. He has a passion for helping companies and people get real value out of cool technology. Dennis came to IBM through its acquisition of Netezza, where he was Director of Competitive and Market Intelligence. He holds a degree in Electrical Engineering from Stanford University but has spent most of his life on the East Coast. When not working, Dennis enjoys sailing off his backyard on Buzzards Bay and he is relentless in his pursuit of wine enlightenment. You can follow Dennis on Twiiter 

Big SQL in Big Data is a Big Deal

By Dennis Duckworth,

I’ve been doing some work in the area of data warehouse modernization (DWM) recently. You may have seen my previous blog about our new DWM infographic and our view of the data warehouse becoming a more active component in a company’s analytics process.

One of the drivers for DWM is DWA — data warehouse augmentation — adding components around the data warehouse to address new capabilities like exploration of unstructured data. Similarly, there is a lot of talk these days about data lakes, data reservoirs, data refineries, etc. One of the questions that comes up when discussing putting data in any new place outside of the data warehouse is, “How do I access the data and analyze it there?”

Business analysts are used to doing analytics on data in the data warehouse – they have been using SQL for a long time and it is comfortable for them. But they are wary (and maybe even a little weary) of all the talk about NoSQL and Hadoop. They might see incredible value in including unstructured/semi-structured data in their analyses but they aren’t quite sure how they would do that. They probably aren’t going to learn Java so they can use MapReduce on their company’s new Hadoop clusters and by the time the IT guys get around to doing ETL on that data and pulling it into the data warehouse, it has lost some relevance and, therefore, value.

Nowadays, SQL access seems to be a priority for some of the NoSQL vendors, looking to give those business analysts the ability to use their beloved SQL (or some reasonable facsimile thereof) to do their queries against the new NoSQL data stores. So we saw Cloudera come out with Impala and then Hortonworks do significant work to improve the performance of Hive through their Stinger initiative.

Business users are speaking up, saying they want their familiar SQL access to data regardless of where it is, and the vendors are listening — that a good thing. But as some of the large database/data warehouse companies started jumping on the SQL-on-Hadoop bandwagon, I noticed something a bit nonsensical, at least from a Hadoop perspective. Those large vendors created “solutions” that were based on using their database/data warehouse products. So whereas the Hadoop vendors were building SQL query capabilities directly into their Hadoop offerings, the db/dw folks were building SQL-on-Hadoop into their mainstream RDBMS/SQL engines. That means to get the “benefit” of SQL access to Hadoop, you need to use their RDBMS product.

One of the key goals of data warehouse modernization (and augmentation) is to *not* put additional load on the RDBMS/data warehouse, especially load that doesn’t belong there. Why should you need to use an Oracle Exadata or a Teradata 6750 if you are trying to run a SQL query against just your Hadoop cluster? Well, I guess Oracle and Teradata would answer “To keep people using our expensive products” – but isn’t cost reduction one of the reasons your company wants to do more in Hadoop in the first place?

IBM created Big SQL, its SQL-on-Hadoop solution, to work completely within our Hadoop distribution (built on Apache-standard Hadoop), IBM InfoSphere BigInsights for Hadoop. You don’t need to have a separate PureData System for Analytics data warehouse appliance or a separate machine running IBM DB2 – everything you need to run SQL on Hadoop comes as part of BigInsights and it runs entirely in the Hadoop cluster. In that way, IBM is more like Cloudera and Hortonworks than like Oracle and Teradata – we see Hadoop as a first class citizen in the overall data and analytics framework rather than as an accessory to (and life support for) our RDBMS.

IBM Big SQL v3.0 is in Technology Preview right now. You can learn more about it here or you can try it out here.

About Dennis Duckworth

Dennis Duckworth, Program Director of Product Marketing for Data Management & Data Warehousing has been in the data game for quite a while, doing everything from Lisp programming in artificial intelligence to managing a sales territory for a database company. He has a passion for helping companies and people get real value out of cool technology. Dennis came to IBM through its acquisition of Netezza, where he was Director of Competitive and Market Intelligence. He holds a degree in Electrical Engineering from Stanford University but has spent most of his life on the East Coast. When not working, Dennis enjoys sailing off his backyard on Buzzards Bay and he is relentless in his pursuit of wine enlightenment. You can follow Dennis on Twiiter