Three session guides get you started with data warehousing at IBM Insight at World of Watson

Join us October 24 to 27, 2016 in Las Vegas!

by Cindy Russell, IBM Data Warehouse marketing

IBM Insight has been the premiere data management and analytics event for IBM analytics technologies, and 2016 is no exception.  This year, IBM Insight is being hosted along with World of Watson and runs from October 24 to 27, 2016 at the Mandalay Bay in Las Vegas, Nevada.  It includes 1,500 sessions across a range of technologies and features keynotes by IBM President and CEO, Ginni Rometty; Senior Vice President of IBM Analytics, Bob Picciano; and other IBM Analytics and industry leaders.  Every year, we include a little fun as well, and this year the band is Imagine Dragons.

IBM data warehousing sessions will be available across the event as well as in the PureData System for Analytics Enzee Universe (Sunday, October 23).  Below are product-specific quick reference guides that enable you to see at a glance key sessions and activities, then plan your schedule.  Print these guides and take them with you or put the links to them on your phone for reference during the conference.

This year, the Expo floor is called the Cognitive Concourse, and we are located in the Monetizing Data section, Cognitive Cuisine experience area.  We’ll take you on a tour across our data warehousing products and will have some fun as we do it, so please stop by.  There is also a demo room where you can see live demos and engage with our technical experts, as well as a series of hands-on labs that let you experience our products directly.

The IBM Insight at World of Watson main web page is located here.  You can register and then use the agenda builder to create your personalized schedule.

IBM PureData System for Analytics session reference guide

Please find the session quick reference guide for PureData System for Analytics here: ibm.biz/wow_enzee

Enzee Universe is a full day of dedicated PureData System for Analytics / Netezza sessions that is held on Sunday, October 23, 2016.  To register for Enzee Universe, select sessions 3459 and 3461 in the agenda builder tool.  This event is open to any full conference pass holder.

During the regular conference, there are also more than 35 PureData, Netezza, IBM DB2 Analytics Accelerator for z/OS (IDAA) technical sessions across all the conference tracks, as well as hands on labs.  There are several session being presented by IBM clients so you can see how they put PureData System for Analytics to use.  Click the link above to see the details.

IBM dashDB Family session reference guide

Please find the session quick reference guide for the dashDB family here: ibm.biz/wow_dashDB

There are a more than 40 sessions for dashDB, including a “Meet the Family” session that will help you become familiar with new products in this family of modern data management and data warehousing tools.  There is also a “Birds of a Feather” panel discussion on Hybrid Data Warehousing, and one that describes some key use cases for dashDB.  And, you can also see a demo, take in a short theatre session or try out a hands-on lab.

IBM BigInsights, Hadoop and Spark session reference guide

Please find the session quick reference guide for BigInsights, Hadoop and Spark topics here: ibm.biz/wow_biginsights

There are more than 65 sessions related to IBM BigInsights, Hadoop and Spark, with several hands on labs and theatre sessions. There is everything from an Introduction to Data Science to Using Spark for Customer Intelligence Analytics to hybrid cloud data lakes to client stories of how they use these technologies.

Overall, it is an exciting time to be in the data warehousing and analytics space.  This conference represents a great opportunity to build depth on IBM products you already use, learn new data warehousing products, and look across IBM to learn completely new ways to employ analytics—from Watson to Internet of Things and much more.  I hope to see you there.

What’s new: IBM Fluid Query 1.6

by Doug Dailey

Editorial Note: IBM Fluid Query 1.7 became available in May, 2016. You can read about features in release 1.6 here, but we also recommend reading the release 1.7 blog here.

The IBM PureData Systems for Analytics team has assembled a value-add set of enhancements over current software versions of Netezza Platform Software (NPS), INZA software and Fluid Query. We have enhanced  integration, security, real-time analytics for System z and usability features with our latest software suite arriving on Fix Central today.

There will be something here for everyone, whether you are looking to integrate your PureData System (Netezza) into a Logical Data Warehouse, improve security, gain more leverage with DB2 Analytics Accelerator for z/OS, or simply improve your day-to-day experience. This post covers the IBM Fluid Query 1.6 technology.  Refer to my NPS and INZA post (link) for more information on the enhancements that are now available in these other areas.

Integrating with the Logical Data Warehouse: Fluid Query overview

Are you struggling with building out your data reservoir, lake or lagoon? Feeling stuck in a swamp? Or, are you surfing effortlessly through an organized Logical Data Warehouse (LDW)?

Fluid Query offers a nice baseline of capability to get your PureData footprint plugged into your broader data environment or tethered directly to your IBM BigInsights Apache Hadoop distribution. Opening access across your broader ecosystem of on-premise, cloud, commodity hardware and Hadoop platforms gets you ever closer to capturing value throughout “systems of engagement” and “systems of record” so you can reveal new insights across the enterprise.

Now is the time to be fluid in your business, whether it is ease of data integration, access to key data for discovery/exploration, monetizing data, or sizing fit-for-purpose stores for different data types.  IBM Fluid Query opens these conversations and offers some valuable flexibility to connect the PureData System with other PureData Systems, Hadoop, DB2, Oracle and virtually any structured data source that supports JDBC drivers.

The value of content and the ability to tap into new insights is a must have to compete in any market. Fluid Query allows you to provision data for better use by application developers, data scientists and business users. We provide the tools to build the capability to enable any user group.

fluid query connectors

What’s new in Fluid Query 1.6?

Fluid Query was released this year and is in its third “agile” release of the year. As part of NPS software, it is available at no charge to existing PureData clients, and you will find information on how to access Fluid Query 1.6 below.

This capability enables you to query more data for deeper analytics from PureData. For example, you can query data in the PureData System together with:

  • Data in IBM BigInsights or other Hadoop implementations
  • Relational data stores (DB2, 3rd party and open source databases like Postgres, MySQL, etc.)
  • Multi-generational PureData Systems for Analytics systems (“Twin Fin”, “Striper”, “Mako”)

The following is a summary of some new features in the release that all help to support your needs for insights across a range of data types and stores:

  • Generic connector for access to structured data stores that support JDBC
    This generic connector enables you to select the database of choice. Database servers and engines like Teradata, SQL Server, Informix, MemSQL and MAPR can now be tapped for insight. We’ve also provided a capability to handle any data type mismatches between differing source/target systems.
  • Support for compressed read from Big SQL on IBM BigInsights
    Now using the Big SQL capability in IBM BigInsights, you are able to read compressed data in Hadoop file systems such as Big Insights, Cloudera and Hortonworks. This adds increased flexibility and efficiency in storage, data protection and access.
  • Ability to import databases to Hadoop and append to tables in Hadoop
    New capabilities now enable you to import databases to Hadoop, as well as append data in existing tables in Hadoop. One use case for this is backing up historical data to a queryable archive to help manage capacity on the data warehouse. This may include incremental backups, for example from a specific date for speed and efficiency.
  • Support for the lastest Hadoop distributions
    Fluid Query v. 1.6 now supports the latest Hadoop distributions, including BigInsights 4.1, Hortonworks 2.5 and Cloudera 5.4.5. For Netezza software, support is now available for NPS 7.2.1 and INZA 3.2.1.

Fluid Query 1.6 can be easily downloaded from IBM Support Fix Central. I encourage you to refer to my “Getting Started” post that was written for Fluid Query 1.5 for additional tips and instructions. Note that this link is for existing PureData clients. Refer to the section below if you are not a current client.

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to helping our partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For more information,  NPS 7.2.1 and INZA 3.2.1 software, refer to my post.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

IBM Fluid Query 1.0: Efficiently Connecting Users to Data

by Rich Hughes

Launched on March 27th, IBM Fluid Query 1.0 opens doors of “insight opportunity” for IBM PureData System for Analytics clients. In the evolving data ecosystem, users want and need accessibility to a variety of data stores in different locations. This only makes sense, as newer technologies like Apache Hadoop have broadened analytic possibilities to include unstructured data. Hadoop is the data source that accounts for most of the increase in data volume.  By observation, the world’s data is doubling about every 18 months, with some estimates putting the 2020 data volume at 40 zettabytes, or 4021 bytes. This increase by decade’s end would represent a 20 fold growth over the 2011 world data total of 1.821 bytes.1 IT professionals as well as the general public can intuitively feel the weight and rapidity of data’s prominence in our daily lives. But how can we cope with, and not be overrun by, relentless data growth? The answer lies in part, with better data access paths.


IBM Fluid Query 1.0 opens doors of “insight opportunity”for IBM PureData System for Analytics clients. In the evolving data ecosystem, users want and need accessibility to a variety of data stores in different locations.

IBM Fluid Query 1.0 – What is it?

IBM Fluid Query 1.0 is a specific software feature in PureData that provides access to data in Hadoop from PureData appliances. Fluid Query also promotes the fast movement of data between Big Data ecosystems and PureData warehouses.  Enabling query and data movement, this new technology connects PureData appliances with common Hadoop systems: IBM BigInsights, Cloudera, and Hortonworks. Fluid Query allows results from PureData database tables and Hadoop data sources to be merged, thus creating powerful analytic combinations.


Fluid Query allows results from PureData System for Analytics database tables and Hadoop data sources to be merged, thus creating powerful analytic combinations.

IBM® Fluid Query Benefits

Fluid Query makes practical use of existing SQL developer skills. Workbench tools yield productivity gains because SQL remains the query language of choice when PureData and Hadoop schemas logically merge. Fluid Query is the physical bridge whereby a query is pushed efficiently to where the data resides, whether it is in your data warehouse or in your Hadoop environment. Other benefits made possible by Fluid Query include:

  • better exploitation of Hadoop as a “Day 0” archive, that is queryable with conventional SQL;
  • combining hot data from PureData with colder data from Hadoop; and
  • archiving colder data from PureData to Hadoop to relieve resources on the data warehouse.

Managing your share of Big Data Growth

Fluid Query provides data access between Hadoop and PureData appliances. Your current data warehouse, the PureData System for Analytics, can be extended in several important ways over this bridge to additional Hadoop capabilities. The coexistence of PureData appliances alongside Hadoop’s beneficial features is a best-of-breed approach where tasks are performed on the platform best suited for that workload. Use the PureData warehouse for production quality analytics where performance is critical to the success of your business, while simultaneously using Hadoop to discover the inherent value of full-volume data sources.

How does Fluid Query differ from IBM BigSQL technology?

Just as IBM PureData System for Analytics innovated by moving analytics to the data, IBM Big SQL moves queries to the correct data store. IBM Big SQL supports query federation to many data sources, including (but not limited to) IBM PureData System for Analytics; DB2 for Linux, UNIX and Windows database software; IBM PureData System for Operational Analytics; dashDB, Teradata, and Oracle. This allows users to send distributed requests to multiple data sources within a single SQL statement. IBM Big SQL is a feature included with IBM BigInsights for Apache Hadoop which is an included software entitlement with IBM PureData System for Analytics. By contrast, many Hadoop and database vendors rely on significant data movement just to resolve query requests—a practice that can be time consuming and inefficient.

Learn more

Since March 27, 2015, IBM® Fluid Query 1.0 has been generally available as a software addition to PureData System for Analytics customers. If you want to understand how to take advantage of IBM® Fluid Query 1.0 check out these two sources: the on-demand webcast, Virtual Enzee – The Logical Data Warehouse, Hadoop and PureData System for Analytics , and the IBM Fluid Query solution brief. Update: Learn about Fluid Query 1.5, announced July, 2015.

About Rich,

Rich HughesRich Hughes is an IBM Marketing Program Manager for Data Warehousing.  Hughes has worked in a variety of Information Technology, Data Warehousing, and Big Data jobs, and has been with IBM since 2004.  Hughes earned a Bachelor’s degree from Kansas University, and a Master’s degree in Computer Science from Kansas State University.  Writing about the original Dream Team, Hughes authored a book on the 1936 US Olympic basketball team, a squad composed of oil refinery laborers and film industry stage hands. You can follow him on Twitter: @rhughes134

Footnote:
1 “How Much Data is Out There” by Webopedia Staff, Webopedia.com, March 3, 2014.

Fluid doesn’t just describe your coffee anymore … Introducing IBM Fluid Query 1.0

by Wendy Lucas

Having grown up in the world of data and analytics, I long for the days when our goal was to create a single version of the truth. Remember  when data architecture diagrams showed source systems flowing through ETL, into a centralized data warehouse and then out to business intelligence applications? Wow, that was nice and simple, right – at least conceptually? As a consultant, I can still remember advising clients and helping them to pictorially represent this reference architecture. It was a pretty simple picture, but that was also a long time ago.

While IT organizations struggled with data integration, enterprise data models and producing the single source of the truth, the lines of business grew impatient and would build their own data marts (or data silos).  We can think of this as the first signs of the requirement for user self-service. The goal behind building the consolidated, enterprise, single version of the truth never went away. Sure, we still want the ability to drive more accurate decision-making, deliver consistent reporting, meet regulatory requirements, etc. However, the ability to achieve this goal became very difficult as requirements for user self-service, increased agility, new data types, lower cost solutions, better business insight and faster time to value became more important.

Recognizing the Logical Data Warehouse

Enterprises have developed collections of data assets that each provide value for specific workloads and purposes. This includes data warehouses, data marts, operational data stores and Hadoop data stores to name a few. It is really this collection of data assets that now serves as the foundation for driving analytics, fulfilling the purpose of the data warehouse within the architecture. The Logical Data Warehouse or LDW is a term we use to describe the collection of data assets that make up the data warehouse environment, recognizing that the data warehouse is no longer just a single entity. Each data store within the Logical Data Warehouse can be built on a different platform, fit for the purpose of the workload and analytic requirements it serves.


Each data store within the Logical Data Warehouse can be built on a different platform, fit for the purpose of the workload and analytic requirements it serves.

But doesn’t this go against the single version of the truth? The LDW will still struggle to deliver on the goal behind the single version of the truth, if it doesn’t have information governance, common metadata and data integration practices in place. This is a key concept. If you’re interested in more on this topic, check out a recent webcast by some of my colleagues on the “Five Pitfalls to Avoid in Your Data Warehouse Modernization Project: Making Data Work for You.”

Unifying data across the Logical Data Warehouse

Logically grouping separate data stores into the LDW does not necessarily make our lives easier. Assuming you have followed good information governance practices, you still have data stores in different places, perhaps on different platforms. Haven’t you just made your application developers and users lives, who want self-service, infinitely more difficult? Users need the ability to leverage data across these various data stores without having to worry about the complexity of where to find it, or re-writing their applications. And let’s not forget about the needs of IT. DBAs struggle to manage capacity and performance on data warehouses while listening to Hadoop administrators brag about the seemingly endless, lower cost storage and ability to manage new data types that they can provide. What if we could have the best of all worlds? Provide seamless access to data across a variety of stores, formats, and platforms. Provide capability for IT to manage Hadoop and Data Warehouses along-side each other in a way that leverages the strengths of both.

Introducing IBM Fluid Query

IBM Fluid Query is the capability to unify data across the Logical Data Warehouse, providing the ability to seamlessly access data in it’s various forms and locations. No matter where a user connects within the logical data warehouse, users have access to all data through the same, standard API/SQL/Analytics access. IBM Fluid Query powers the Logical Data Warehouse, giving users the ability to combine numerous types of data from various sources in a fast and agile manner to drive analytics and deeper insight, without worrying about connecting to multiple data stores, using different syntaxes or API’s or changing their application.

In its first release, IBM Fluid Query 1.0 will provide users of the IBM PureData System for Analytics the capability to access Hadoop data from their data warehouse and move data between Hadoop and PureData if needed. High performance is about moving the query to the data, not the data to the query. This provides extreme value to PureData users who want the ability to merge data from their structured data warehouse with Hadoop for powerful analytic combinations, or more in-depth analysis. IBM Fluid Query 1.0 is part of a toolkit within Netezza Platform Software (NPS) on the appliance so it’s free for all PureData System for Analytics customers.


IBM Fluid Query 1.0 will provide users of the IBM PureData System for Analytics the capability to access Hadoop data from their data warehouse and move data between Hadoop and PureData

For Hadoop users, IBM also provides IBM Big SQL which delivers Fluid Query capability. Big SQL provides the ability to run queries on a variety of data stores, including PureData System for Analytics, DB2 and many others from your IBM BigInsights Hadoop environment. Big SQL has the ability to push the query to the data store and return the result to Hadoop without moving all the data across the network. Other Hadoop vendors provide the ability to write queries like this but they move all the data back to Hadoop before filtering, applying predicates, joining, etc. In the world of big data, can you really afford to move lots of data around to meet the queries that need it?

IBM Fluid Query 1.0 is generally available on March 27 as a software addition to PureData System for Analytics customers. If you are an existing customer and want to understand how to take advantage of IBM Fluid Query 1.0 or if you just would like more information, I encourage you to listen to this on-demand webcast: Virtual Enzee – The Logical Data Warehouse, Hadoop and PureData System for Analytics  and check out the solution brief. Or if you are an existing PureData System for Analytics customer, download this software. Update: Learn about Fluid Query 1.5, announced July, 2015.

About Wendy,

Wendy LucasWendy Lucas is a Program Director for IBM Data Warehouse Marketing. Wendy has over 20 years of experience in data warehousing and business intelligence solutions, including 12 years at IBM. She has helped clients in a variety of roles, including application development, management consulting, project management, technical sales management and marketing. Wendy holds a Bachelor of Science in Computer Science from Capital University and you can follow her on Twitter at @wlucas001

Is the Data Warehouse Dead? Is Hadoop trying to kill it?

By Dennis Duckworth

I attended the Strata + Hadoop World Conference in San Jose a few weeks ago, which I enjoyed immensely. I found that this conference had a slightly different “feel” than previous Hadoop conferences in terms of how Hadoop was being positioned. Since I am from the data warehouse world, I have been sensitive to Hadoop being promoted as a replacement for the data warehouse.

In previous conferences, sponsors and presenters seemed almost giddy in their prognostication that Hadoop would become the main data storage and analytics platform in the enterprise, taking more and more load from the data warehouse and eventually replacing it completely. This year, there didn’t seem to be much negative talk about data warehouses. Cloudera, for example, clearly showed its Hadoop-based “Enterprise Data Hub” as being complementary to the Enterprise Data Warehouse rather than as a replacement, reiterating the clarification of their positioning and strategy that they made last year. Maybe this was an indication that the Hadoop market was maturing even more, with companies having more Hadoop projects in production and, thus, having more real experience with what Hadoop did well and, as importantly, what it didn’t do well. Perhaps, too, the data warehouse escaped being the villain (or victim) because the “us against them” camp was distracted by the emergence and perceived threat of some other technologies like Spark and Mesos.

The conference was just another data point supporting my hypothesis that Hadoop and other Big Data technologies are complementing existing data warehouses in enterprises rather than replacing them. Another data point (actually a collection of many data points) can be seen in the survey results of The Information Difference Company as reported in the paper “Is the Data Warehouse Dead?”, sponsored by IBM. You can download a copy here.

Reading through this report, I found myself recalling many of the conversations I myself have had with customers and prospects over the last few years. If you have read some of my previous blogs, you will know that IBM is a big believer in the power of Big Data. We have solutions that help enterprises deal with the new challenges they are facing with the increasing size, speed and diversity of data. But we continue to offer and recommend relational database and data warehouse solutions because they are essential for deriving business value from data – they have done that in the past, they continue to do so today.

We believe that they will continue doing so going forward. Structured data doesn’t go away, nor does the need for doing analytics (descriptive, predictive, or prescriptive) on the data. An analytics engine that was created and tuned for structured data will continue to be the best place to do such analytics. Sure, you can do some really neat data exploration and visualizations on all sorts of data in Hadoop, but you still need your daily/weekly/monthly reports and your executive dashboards, all needing to be produced within shrinking time windows, that are all fueled by structured data.

About Dennis Duckworth

Dennis Duckworth, Program Director of Product Marketing for Data Management & Data Warehousing has been in the data game for quite a while, doing everything from Lisp programming in artificial intelligence to managing a sales territory for a database company. He has a passion for helping companies and people get real value out of cool technology. Dennis came to IBM through its acquisition of Netezza, where he was Director of Competitive and Market Intelligence. He holds a degree in Electrical Engineering from Stanford University but has spent most of his life on the East Coast. When not working, Dennis enjoys sailing off his backyard on Buzzards Bay and he is relentless in his pursuit of wine enlightenment.

See also: New Fluid Query for PureData and Hadoop by Wendy Lucas

How To Make Good Decisions in Deploying the Logical Data Warehouse

By Rich Hughes,

A recent article addresses the challenges facing businesses trying to improve their results by analyzing data. As Hadoop’s ability to process large data volumes continues to gain acceptance, Dwaine Snow provides a reasonable method to examine when and under what circumstances to deploy Hadoop alongside your PureData System for Analytics (PDA).   Snow makes the case that traditional data warehouses, like PDA, are not going away because of the continued value they provide. Additionally, Hadoop distributions also are playing a valuable role in meeting some of the challenges in this evolving data ecosystem.

The valuable synergy between Hadoop and PDA are illustrated conceptually as the logical data warehouse in Snow’s December 2014 paper (Link to Snow’s Paper).

The logical data warehouse diagrams the enterprise body of data stores, connective tissue like APIs, and the cognitive features like analytical functions.  The logical data warehouse documents the traditional data warehouse, which began about 1990, and its use of structured data bases.  Pushed by the widespread use of the Internet and its unstructured data exhaust, the Apache Hadoop community was founded as a means to store, evaluate, and make sense of unstructured data.  Hadoop thus imitated the traditional data warehouse in evaluating value from the data available, then retaining the most valuable data sources from that investigation.  As well, the discovery, analytics, and trusted data zone architecture of today’s logical data warehouse resembles the layered architecture of yesterday’s data warehouse.

Since its advent some 10 years ago, Hadoop has branched out to servicing SQL statements against structured data types, which brings us back to the business challenge:  where can we most effectively deploy our data assets and analytic capabilities?  In answering this question, Snow discusses the fit-for-purpose repositories which for success, require inter-operability across the various zones and data stores.  Each data zone is evaluated for cost, value gained, and required performance on service level agreements.

By looking at this problem as a manufacturing sequence, the raw material / data is first acquired, then manipulated into a higher valued product—in this case, the value being assessed by the business consumer based on insights gained and speed of delivery.  Hadoop distributed file environments shows its worth in storing relatively larger data volumes and accessing both structured and unstructured data.  Traditional data warehouses like IBM’s PureData System for Analytics display their value in being the system of record where advanced analytics are delivered in a timely fashion.

In an elegant cost benefit analysis, Snow provides the tools necessary to weigh where best to deploy the different, but complimentary data insight technologies.  A listing of Total Cost of Ownership (TCO) for Hadoop includes four line items:

  1. Initial system cost (hardware and software)
  2. Annual system maintenance cost
  3. Setup costs to get the system ‘up and running’
  4. Costs for humans managing the ongoing system administration

Looking at just the first cost item, which is sometimes reduced to a per Terabyte price like $1,000 per TB, is but part of the story.  The article documents the other unavoidable tasks for deploying and maintaining a Hadoop cluster.  Yes, $200,000 might be the price for the hardware and software for a 200TB system, but over a five year ownership, industry studies are cited in ascribing the other significant budget expenses.  Adding up the total costs, the conclusion is that the final amount could very well be in excess of $2,000,000.

The accurate TCO number is then subtracted from the business benefits of using the system, which determines net value gained.  And business benefits are accrued, Snow notes, from query activity.  Only 1% of the queries in today’s data analytic systems require all of the data, which makes that activity perfect for the lower cost and performance Hadoop model.  Conversely, 90% of current queries require only 20% of the data, which matches well with the characteristics of the PureData System for Analytics:  reliability with faster analytic performance.  What Snow has shown is the best-of-breed nature of the Logical Data Warehouse, and as the ancient slogan suggests, how to get more “bang for the buck”.

About Rich Hughes,

Rich Hughes is an IBM Marketing Program Manager for Data Warehousing.  Hughes has worked in a variety of Information Technology, Data Warehousing, and Big Data jobs, and has been with IBM since 2004.  Hughes earned a Bachelor’s degree from Kansas University, and a Master’s degree in Computer Science from Kansas State University.  Writing about the original Dream Team, Hughes authored a book on the 1936 US Olympic basketball team, a squad composed of oil refinery laborers and film industry stage hands. You can follow him on @rhughes134

The Logical Data Warehouse : Two Easy Pieces (DW+Hadoop)

By Dennis Duckworth,

In some of our recent blogs, we have described our Data Warehouse Point of View and our Zone Architecture for Big Data. We developed these from our experiences with our customers, seeing what worked (and what didn’t), to encourage those who are just starting out on their analytics journeys or those who are disappointed by the performance or rigidity of their existing data warehouse environments to at least consider the advantages of separating data (and corresponding analytics) into different zones based on the characteristics of both. We have been using the term Data Warehouse Modernization to describe the renovation of old traditional monolithic data warehouses (along with other data silos) into hybrid, integrated, or logical data warehouse models.

In a sort of modernization of our own, we have reexamined how we go to market with our data warehouse and data management products to see how we might make it easier for our customers to implement the best practices that we actively promote. With the recent release of our latest data warehouse appliance, the PureData System for Analytics (PDA) N3001 (codename Mako), we had the chance to make some changes. Now,  for example, included with every PDA appliance we ship (every configuration, from the smallest, the “Mako-mini” 2 server rack-mountable appliance, all the way up to our largest, our 8-rack system), we include license entitlements for other IBM software products we firmly believe can help our customers in creating a modern, flexible, high performance logical data warehouse environment. One of those entitlements is for IBM InfoSphere BigInsights for Hadoop.

Studies are proving out our opinion that the logical data warehouse is a critical contributor to analytic success for enterprises. In the recently released 2014 IBM Institute for Business Value analytics study, companies were analyzed and categorized by the extent and the effectiveness of analytics in them. Those in the top category, the “front runners”, use data to the highest benefit. They have been successful in “blending” their traditional business intelligence infrastructures with big data technologies to create agility and flexibility in the way they ingest, manage and use data. Quite interestingly, and consistent with our guidance in these blogs, almost all of the front runners (92 percent) have an integrated (or hybrid) data warehouse and, as part of that, they are 10 times than more likely than other organizations to have a big data landing platform. In practice, they have implemented what we have called zone architecture to allow them to collect and analyze a wider variety of data, empowering their employees to make full use of their traditional data and new types of data together.

DL 1

Our customers are also providing proof that data warehouse modernization works. How are these customers using BigInsights and these big data landing platforms? Many are creating what we have been calling data reservoirs. As you may recall from our blogs here and from the hundreds/thousands of other posts on the topic, Hadoop is finding a home in the enterprise as the preferred technology for data reservoirs. These are landing areas for all the data you think may be useful in your company, whether it is structured, unstructured, or semi-structured. Some more specific examples: One of our customers is using BigInsights in combination with the PureData System for Analytics to help it convert users of its free cloud service to customers for their paid service, using predictive analytics on user behavior (structured and unstructured data) to target them more accurately with offers. Another, a telco, is using BigInsights with PDA along with InfoSphere Streams to get a 360° view of its customers and to enable them to react in real-time to customer satisfaction issues. (The InfoSphere Streams entitlement with PDA will be the topic of a future blog.)

The BigInsights entitlement that comes with the N3001 PureData System for Analytics is for 5 virtual nodes which, by our calculations, gives you the ability to manage about 100TB of data. So this is not a useless little demo version – this license gives you the ability to create and use a full-blown Hadoop cluster with all of the advantages that BigInsights has to offer, things like Big SQL for SQL access to the data in BigInsights, Big Sheets (enables Excel like spreadsheet exploration of the data), text analytics accelerator, Big R (which allows you to explore, visualize, transform, and model big data using familiar R syntax), and a long list of other features and capabilities. You get all of this (and much more) with every N3001 PureData System for Analytics. With software entitlements like this, we allow you to practice what we preach: modernize your data management environment by putting data and the corresponding analytics on the proper platform.

About Dennis Duckworth

Dennis Duckworth, Program Director of Product Marketing for Data Management & Data Warehousing has been in the data game for quite a while, doing everything from Lisp programming in artificial intelligence to managing a sales territory for a database company. He has a passion for helping companies and people get real value out of cool technology. Dennis came to IBM through its acquisition of Netezza, where he was Director of Competitive and Market Intelligence. He holds a degree in Electrical Engineering from Stanford University but has spent most of his life on the East Coast. When not working, Dennis enjoys sailing off his backyard on Buzzards Bay and he is relentless in his pursuit of wine enlightenment. You can follow Dennis on Twiiter