Three session guides get you started with data warehousing at IBM Insight at World of Watson

Join us October 24 to 27, 2016 in Las Vegas!

by Cindy Russell, IBM Data Warehouse marketing

IBM Insight has been the premiere data management and analytics event for IBM analytics technologies, and 2016 is no exception.  This year, IBM Insight is being hosted along with World of Watson and runs from October 24 to 27, 2016 at the Mandalay Bay in Las Vegas, Nevada.  It includes 1,500 sessions across a range of technologies and features keynotes by IBM President and CEO, Ginni Rometty; Senior Vice President of IBM Analytics, Bob Picciano; and other IBM Analytics and industry leaders.  Every year, we include a little fun as well, and this year the band is Imagine Dragons.

IBM data warehousing sessions will be available across the event as well as in the PureData System for Analytics Enzee Universe (Sunday, October 23).  Below are product-specific quick reference guides that enable you to see at a glance key sessions and activities, then plan your schedule.  Print these guides and take them with you or put the links to them on your phone for reference during the conference.

This year, the Expo floor is called the Cognitive Concourse, and we are located in the Monetizing Data section, Cognitive Cuisine experience area.  We’ll take you on a tour across our data warehousing products and will have some fun as we do it, so please stop by.  There is also a demo room where you can see live demos and engage with our technical experts, as well as a series of hands-on labs that let you experience our products directly.

The IBM Insight at World of Watson main web page is located here.  You can register and then use the agenda builder to create your personalized schedule.

IBM PureData System for Analytics session reference guide

Please find the session quick reference guide for PureData System for Analytics here: ibm.biz/wow_enzee

Enzee Universe is a full day of dedicated PureData System for Analytics / Netezza sessions that is held on Sunday, October 23, 2016.  To register for Enzee Universe, select sessions 3459 and 3461 in the agenda builder tool.  This event is open to any full conference pass holder.

During the regular conference, there are also more than 35 PureData, Netezza, IBM DB2 Analytics Accelerator for z/OS (IDAA) technical sessions across all the conference tracks, as well as hands on labs.  There are several session being presented by IBM clients so you can see how they put PureData System for Analytics to use.  Click the link above to see the details.

IBM dashDB Family session reference guide

Please find the session quick reference guide for the dashDB family here: ibm.biz/wow_dashDB

There are a more than 40 sessions for dashDB, including a “Meet the Family” session that will help you become familiar with new products in this family of modern data management and data warehousing tools.  There is also a “Birds of a Feather” panel discussion on Hybrid Data Warehousing, and one that describes some key use cases for dashDB.  And, you can also see a demo, take in a short theatre session or try out a hands-on lab.

IBM BigInsights, Hadoop and Spark session reference guide

Please find the session quick reference guide for BigInsights, Hadoop and Spark topics here: ibm.biz/wow_biginsights

There are more than 65 sessions related to IBM BigInsights, Hadoop and Spark, with several hands on labs and theatre sessions. There is everything from an Introduction to Data Science to Using Spark for Customer Intelligence Analytics to hybrid cloud data lakes to client stories of how they use these technologies.

Overall, it is an exciting time to be in the data warehousing and analytics space.  This conference represents a great opportunity to build depth on IBM products you already use, learn new data warehousing products, and look across IBM to learn completely new ways to employ analytics—from Watson to Internet of Things and much more.  I hope to see you there.

Build skills for 2016 and Beyond: Data Warehousing and Analytics Top 10 Resources

by Cindy Russell, IBM Data Warehouse Marketing

Skills are always an essential consideration in technical careers and it is important for data warehousing professionals to expand their knowledge to handle the proliferation of data types and volumes in 2016 and beyond.

These are my “top 10” resource picks that you may want to explore. I am choosing these because of their popularity and also because they represent new technologies you may face in 2016 as you modernize your data warehouse and extend it beyond its traditional realm to meet new analytics needs.

  1. Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics – I am recommending this report because it provides an overview of the trends, issues and marketplace leaders in data warehousing. It calls out the need for the Logical Data Warehouse, which is a key element of a modernization strategy. I believe the Logical Data Warehouse will be of increasing importance to your operations in the coming months. Read a summary of the report.
  2. Logical Data Warehouse – Due to the massive and rapid growth of data volumes and types, a single centralized data warehouse cannot meet all of the new needs for analytics by itself. The data warehouse now becomes part of a Logical Data Warehouse in which a set of “fit for purpose” stores are used to house a range of data. This blog by Wendy Lucas was published in 2014, but is still a good primer on the concept if you need one.
  3. IBM Fluid Query information and entitlement for PureData clients – In 2015, we released a series of “agile” announcements of IBM Fluid Query. This is a tool that PureData System for Analytics clients can use to query more data sources for deeper insights. This tool is a key element when you have a Logical Data Warehouse where data stores include Hadoop, databases, other data warehouses and more. PureData clients can take advantage of this technology as part of the entitlements. Start learning with our blog series and webcast.
  4. dashDB, data warehousing on the cloud – dashDB was launched in 2014 as the IBM fully managed data warehouse in the cloud. Some initial use cases cloud be: setting up self-service data science sandboxes, establishing test environments or cost-effectively housing data that is already external, such as social media feeds. dashDB is based on the Netezza and BLU Acceleration in-memory computing technologies. If you have workloads you want to place on the cloud, dashDB is a good solution. This webcast and a TDWI Checklist for cloud get you started.
  5. Hadoop and Big SQL – Hadoop is a scalable, cost-effective, open source file system that can store a range of structured or unstructured data as part of a Logical Data Warehouse. It can also be used to help you manage capacity on the data warehouse, for example as a queryable historical archive. Read this blog by our expert to learn the basics. IBM provides a free open source distribution, IBM Open Platform with Apache Hadoop. For those looking to augment the IBM Open Platform, IBM BigInsights adds enterprise-grade features including visualization, exploration and advanced analytics. Within the family is an implementation that includes Big SQL—enabling you to use familiar SQL skills to query data in Hadoop. Explore the above content options, then get started with a no charge trial.
  6. Apache Spark –IBM announced a major commitment to Apache Spark in June, 2015 and has already made available a series of Spark-based products and cloud services. You will be seeing more of Spark across the IBM Analytics portfolio, so it is a good technology to learn. Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. It performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for speed, and supports the Java, Scala, and Python APIs for ease of development. I recommend this no charge Big Data University course on Spark fundamentals.
  7. Update to IBM Netezza Analytics software – For those of you who are PureData System for Analytics clients, there is an update to the Netezza Analytics software. Doug Daily is one of our experts in this area, and he created an announcement blog to help you understand what new capabilities you can leverage.
  8. Virtual Enzee on demand webcasts – IBM offers webcasts on topics related to data warehousing and PureData System for Analytics. Browse the “Virtual Enzee” webcast library to stay up to date on PureData through these on demand webcasts.
  9. Learn Cognos Analytics for user self-service applications – Some of our clients use Cognos BI in conjunction with their data warehouses for super-fast reporting. Cognos Analytics was announced at IBM Insight as a guided, self-service capability that provides a personal approach to analytics. As your users are demanding more insights, self-service may be a sound solution to some of their needs. Browse the blog and web site to learn more.
  10. IBMGo on demand keynotes from IBM Insight – If you were unable to attend IBM Insight 2015, IBMGo brings some of the main sessions to you! It is a great way to learn about the bigger IBM Analytics solutions and points of view. Start here.

Tweet this blog

What’s new: IBM Fluid Query 1.6

by Doug Dailey

Editorial Note: IBM Fluid Query 1.7 became available in May, 2016. You can read about features in release 1.6 here, but we also recommend reading the release 1.7 blog here.

The IBM PureData Systems for Analytics team has assembled a value-add set of enhancements over current software versions of Netezza Platform Software (NPS), INZA software and Fluid Query. We have enhanced  integration, security, real-time analytics for System z and usability features with our latest software suite arriving on Fix Central today.

There will be something here for everyone, whether you are looking to integrate your PureData System (Netezza) into a Logical Data Warehouse, improve security, gain more leverage with DB2 Analytics Accelerator for z/OS, or simply improve your day-to-day experience. This post covers the IBM Fluid Query 1.6 technology.  Refer to my NPS and INZA post (link) for more information on the enhancements that are now available in these other areas.

Integrating with the Logical Data Warehouse: Fluid Query overview

Are you struggling with building out your data reservoir, lake or lagoon? Feeling stuck in a swamp? Or, are you surfing effortlessly through an organized Logical Data Warehouse (LDW)?

Fluid Query offers a nice baseline of capability to get your PureData footprint plugged into your broader data environment or tethered directly to your IBM BigInsights Apache Hadoop distribution. Opening access across your broader ecosystem of on-premise, cloud, commodity hardware and Hadoop platforms gets you ever closer to capturing value throughout “systems of engagement” and “systems of record” so you can reveal new insights across the enterprise.

Now is the time to be fluid in your business, whether it is ease of data integration, access to key data for discovery/exploration, monetizing data, or sizing fit-for-purpose stores for different data types.  IBM Fluid Query opens these conversations and offers some valuable flexibility to connect the PureData System with other PureData Systems, Hadoop, DB2, Oracle and virtually any structured data source that supports JDBC drivers.

The value of content and the ability to tap into new insights is a must have to compete in any market. Fluid Query allows you to provision data for better use by application developers, data scientists and business users. We provide the tools to build the capability to enable any user group.

fluid query connectors

What’s new in Fluid Query 1.6?

Fluid Query was released this year and is in its third “agile” release of the year. As part of NPS software, it is available at no charge to existing PureData clients, and you will find information on how to access Fluid Query 1.6 below.

This capability enables you to query more data for deeper analytics from PureData. For example, you can query data in the PureData System together with:

  • Data in IBM BigInsights or other Hadoop implementations
  • Relational data stores (DB2, 3rd party and open source databases like Postgres, MySQL, etc.)
  • Multi-generational PureData Systems for Analytics systems (“Twin Fin”, “Striper”, “Mako”)

The following is a summary of some new features in the release that all help to support your needs for insights across a range of data types and stores:

  • Generic connector for access to structured data stores that support JDBC
    This generic connector enables you to select the database of choice. Database servers and engines like Teradata, SQL Server, Informix, MemSQL and MAPR can now be tapped for insight. We’ve also provided a capability to handle any data type mismatches between differing source/target systems.
  • Support for compressed read from Big SQL on IBM BigInsights
    Now using the Big SQL capability in IBM BigInsights, you are able to read compressed data in Hadoop file systems such as Big Insights, Cloudera and Hortonworks. This adds increased flexibility and efficiency in storage, data protection and access.
  • Ability to import databases to Hadoop and append to tables in Hadoop
    New capabilities now enable you to import databases to Hadoop, as well as append data in existing tables in Hadoop. One use case for this is backing up historical data to a queryable archive to help manage capacity on the data warehouse. This may include incremental backups, for example from a specific date for speed and efficiency.
  • Support for the lastest Hadoop distributions
    Fluid Query v. 1.6 now supports the latest Hadoop distributions, including BigInsights 4.1, Hortonworks 2.5 and Cloudera 5.4.5. For Netezza software, support is now available for NPS 7.2.1 and INZA 3.2.1.

Fluid Query 1.6 can be easily downloaded from IBM Support Fix Central. I encourage you to refer to my “Getting Started” post that was written for Fluid Query 1.5 for additional tips and instructions. Note that this link is for existing PureData clients. Refer to the section below if you are not a current client.

fluid query download from fix central

Packaging and distribution

From a packaging perspective we refreshed IBM Netezza Platform Developer Software to this latest NPS 7.2.1 release to ensure the software suite is current from IBM’s Passport Advantage.

Supported Appliances Supported Software
  • N3001
  • N2002
  • N2001
  • N100x
  • C1000
  • Netezza Platform Software v7.2.1
  • Netezza Client Kits v7.2.1
  • Netezza SQL Extension Toolkit v7.2.1
  • Netezza Analytics v3.2.1
  • IBM Fluid Query v1.6
  • Netezza Performance Portal v2.1.1
  • IBM Netezza Platform Development Software v7.2.1

For the Netezza Developer Network we continue to expand the ability to easily pick up and work with non-warranted products for basic evaluation by refreshing the Netezza Emulator to NPS 7.2.1 with INZA 3.2.1. You will find a refresh of our non-warranted version of Fluid Query 1.6 and the complete set of Client Kits that support NPS 7.2.1.

NDN download button

Feel free to download and play with these as a prelude to PureData Systems for Analytics purchase or as a quick way to validate new software functionality with your application. We maintain our commitment to helping our partners working with our systems by maintaining the latest systems and software for you to access. Bring your application or solution and work to certify, qualify and validate them.

For more information,  NPS 7.2.1 and INZA 3.2.1 software, refer to my post.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

IBM Fluid Query: Extending Insights Across More Data Stores

by Rich Hughes

Since its announcement in March, 2015, IBM Fluid Query has opened the door to better business insights for IBM PureData System for Analytics clients. Our clients have wanted and needed accessibility across a wide variety of data stores including Apache Hadoop with its unstructured stores, which is one of the key reasons for the massive growth in data volumes. There is also valuable data in other types of stores including relational databases that are often “systems of record” and “systems of insight”. Plus, Apache Spark is entering the picture as an up-and-coming engine for real-time analytics and machine learning.

IBM is pleased to announce IBM Fluid Query 1.5 to provide seamless integration with these additional data stores—making it even easier to get deeper insights from even more data.

IBM Fluid Query 1.5 – What is it?

IBM Fluid Query 1.5 provides access to data in other data stores from IBM PureData System for Analytics appliances. Starting with Fluid Query 1.0, users were able to query and quickly move data between Hadoop and IBM PureData System for Analytics appliances. This capability covered IBM BigInsights for ApacheHadoop, Cloudera, and Hortonworks.

Now with Fluid Query 1.5, we add the ability to reach into even more data stores including Spark and such popular relational database management systems as:

  • DB2 for Linux, UNIX and Windows
  • dashDB
  • PureData System for Operational Analytics
  • Oracle Database
  • Other PureData System for Analytics implementations

Fluid Query is able to direct queries from PureData System for Analytics database tables to all of these different data sources and get just the results back—thus creating a powerful analytic capability.

IBM Fluid Query Benefits

IBM Fluid Query offers two key benefits. First, it makes practical use of data stores and lets users access them with their existing SQL skills. Workbench tools yield productivity gains as SQL remains the query language of choice when PureData System for Analytics and Hadoop schemas logically merge. IBM Fluid Query provides the physical bridge over which a query is pushed efficiently to where the data needed for that query resides—whether in the same data warehouse, another data warehouse a relational or transactional database, Hadoop or Spark.

Second, IBM Fluid Query enables archiving and capacity management on PureData-based data warehouses. With Fluid Query, users gain:

  • better exploitation of Hadoop as a “Day 0” archive that is queryable with conventional SQL;
  • capabilities to make use of data in a Spark in-memory analytics engine
  • the ability to easily combine hot data from PureData with colder data from Hadoop;
  • data warehouse resource management with the ability to archive colder data from PureData to Hadoop to relieve resources on the data warehouse.

Managing your share of Big Data Growth

The design point for Fluid Query is that the query is moved to the data instead of bringing massive data volumes to the query. This is a best-of-breed approach where tasks are performed on the platform best suited for that workload.

For example, use the PureData System for Analytics data warehouse for production quality analytics where performance is critical to the success of your business, while simultaneously using Hadoop or Spark to discover the inherent value of those full-volume data sources. Or, create powerful analytic combinations across data in other operational systems or analytics warehouses with PureData stores without having to move and integrate data before analyzing it.

IBM Fluid Query 1.5 is now generally available as a software addition to PureData System for Analytics clients. If you want to understand how to take advantage of IBM® Fluid Query 1.5, check out these resources:

About Rich,

Rich HughesRich Hughes is an IBM Marketing Program Manager for Data Warehousing.  Hughes has worked in a variety of Information Technology, Data Warehousing, and Big Data jobs, and has been with IBM since 2004.  Hughes earned a Bachelor’s degree from Kansas University, and a Master’s degree in Computer Science from Kansas State University.  Writing about the original Dream Team, Hughes authored a book on the 1936 US Olympic basketball team, a squad composed of oil refinery laborers and film industry stage hands. You can follow him on Twitter: @rhughes134

What is the fundamental difference between “ETL” and “ELT” in the world of big data?

By Ralf Goetz

Initially,  it seems like  just a different sequence of the two characters “T” and “L”. But this difference often separates successful big data projects from failed ones. Why is that? And how can you avoid falling into the most common data management traps around mastering big data? Let’s examine this topic in more detail.

Why are big data projects different from traditional data warehouse projects?

Big data projects are mostly characterized as one or a combination of these 4 (or 5) data requirements:

  • Volume: the volume of (raw) data
  • Variety: the variety (e.g. structured, unstructured, semi-structured) of data
  • Velocity: the speed of data processing, consummation or analytics of data
  • Veracity: the level of trust in the data
  • (Value): the value behind the data

For big data, each of the “V”s is bigger in terms of order of magnitudes of its classification. For example, a traditional data warehouse data volume is usually around several hundred gigabytes or a low number of terabytes, while big data projects typically handle data volumes of hundreds or even thousands of terabytes. Another example would be that traditional data warehouse systems only manage and process structured data, whereas typical big data projects need to manage and process both structured and unstructured data.

Having this in mind, it is obvious that traditional technologies or methodologies for data warehousing may not be sufficient to handle these big data requirements.

Mastering the data and information supply chain using traditional ETL

This brings us to a widely adapted methodology for data integration called “Extraction, Transformation and Load” (ETL). ETL is a very common methodology in data warehousing and business analytics projects and can be performed by custom programming (e.g. scripts, or custom ETL applications) or with the help of state-of-the-art ETL platforms  such as IBM InfoSphere Information Server.

Extract Transform Load Big Data

 

The fundamental concept behind most ETL implementations is the restriction of the data in the supply chain. Only data, which is presumably important will be identified, extracted and loaded into a staging area inside a database, and later, into the data warehouse. “Presumably” is the weakness in this concept. Who really knows which data is required for which analytic insight and requirement as of now and tomorrow? Who knows which legal or regulatory requirements must be followed in the months and years to come?

Each change in the definition and scope of the information and data supply chain requires a considerable amount of effort, time and budget and is a risk for any production system. There must be a resolution for this dilemma – and here it comes.

A new “must follow” paradigm for big data: ELT

Just a little change in the sequence of two letters will mean everything to the success of your big data project: ELT (Extraction, Load and Transform). This change seems small, but the difference lies in the overall concept of data management.  Instead of restricting the data sources to only “presumably” important data (and all the steps this entails), what if we take all available data, and put it into a flexible, powerful big data platform such as the Hadoop-based IBM InfoSphere BigInsights system?

graphic 2

Data storage in Hadoop is flexible, powerful, almost unlimited, and cost efficient since it can use commodity hardware and scales across many computing nodes and local storage.

Hadoop is a schema-on-read system. It allows the storage of all kinds of data without knowing its format or definition (e.g. JSON, images, movies, text files, spreadsheets, log files and many more). Without the previously discussed limitation in the amount of data which will be extracted in the ETL methodology, we can be sure that we have all data we need today and may need in the future. This also reduces the required effort for the identification of “important” data – this step can literally be skipped: we take all we can get and keep it!

Without the previously discussed limitation in the amount of data which will be extracted in the ETL methodology, we can be sure that we have all data we need today and may need in the future.

Since Hadoop offers a scalable data storage and processing platform, we can utilize these features as a replacement for the traditional staging area inside a database. From here we can take only the data that is required today and analyze it either directly with a business intelligence platform such as IBM Cognos  or IBM SPSS, or use an intermediate layer with deep and powerful analytic capabilities such as IBM PureData System for Analytics.

Refining raw data and gaining valuable insights

Hadoop is great for storage and processing of raw data, but applying powerful and lightning fast complex analytic queries is not its strength, and so another analytics layer makes sense.  PureData System for Analytics is the perfect place for the subsequent in-database analytic processing for “valued” data because of it’s massive parallel processing (MPP) architecture and it’s rich set of analytics functions. PureData can resolve even the most complex analytic queries in only a fraction of the time compared to traditional relational databases. And it scales – from a big data starter project with only a couple of terabytes of data to a petabyte-sized PureData cluster.

 PureData System for Analytics is the perfect place for the subsequent in-database analytic processing for “valued” data because of it’s massive parallel processing architecture (MPP) and it’s rich set of analytic functions.

IBM offers everything you need to master your big data challenges. You can start very small and scale with your growing requirements. Big data projects can be fun with the right technology and services!

About Ralf Goetz 
Ralf GoetzRalf is an Expert Level Certified IT Specialist in the IBM Software Group. Ralf joined IBM trough the Netezza acquisition in early 2011. For several years, he led the Informatica tech-sales team in DACH region and the Mahindra Satyam BI competency team in Germany. He then became part of the technical pre-sales representative for Netezza and later for the PureData System for Analytics. Ralf is still focusing on PDA but is also supporting the technical sales of all IBM BigData products. Ralf holds a Master degree in computer science.

IBM Fluid Query 1.0: Efficiently Connecting Users to Data

by Rich Hughes

Launched on March 27th, IBM Fluid Query 1.0 opens doors of “insight opportunity” for IBM PureData System for Analytics clients. In the evolving data ecosystem, users want and need accessibility to a variety of data stores in different locations. This only makes sense, as newer technologies like Apache Hadoop have broadened analytic possibilities to include unstructured data. Hadoop is the data source that accounts for most of the increase in data volume.  By observation, the world’s data is doubling about every 18 months, with some estimates putting the 2020 data volume at 40 zettabytes, or 4021 bytes. This increase by decade’s end would represent a 20 fold growth over the 2011 world data total of 1.821 bytes.1 IT professionals as well as the general public can intuitively feel the weight and rapidity of data’s prominence in our daily lives. But how can we cope with, and not be overrun by, relentless data growth? The answer lies in part, with better data access paths.


IBM Fluid Query 1.0 opens doors of “insight opportunity”for IBM PureData System for Analytics clients. In the evolving data ecosystem, users want and need accessibility to a variety of data stores in different locations.

IBM Fluid Query 1.0 – What is it?

IBM Fluid Query 1.0 is a specific software feature in PureData that provides access to data in Hadoop from PureData appliances. Fluid Query also promotes the fast movement of data between Big Data ecosystems and PureData warehouses.  Enabling query and data movement, this new technology connects PureData appliances with common Hadoop systems: IBM BigInsights, Cloudera, and Hortonworks. Fluid Query allows results from PureData database tables and Hadoop data sources to be merged, thus creating powerful analytic combinations.


Fluid Query allows results from PureData System for Analytics database tables and Hadoop data sources to be merged, thus creating powerful analytic combinations.

IBM® Fluid Query Benefits

Fluid Query makes practical use of existing SQL developer skills. Workbench tools yield productivity gains because SQL remains the query language of choice when PureData and Hadoop schemas logically merge. Fluid Query is the physical bridge whereby a query is pushed efficiently to where the data resides, whether it is in your data warehouse or in your Hadoop environment. Other benefits made possible by Fluid Query include:

  • better exploitation of Hadoop as a “Day 0” archive, that is queryable with conventional SQL;
  • combining hot data from PureData with colder data from Hadoop; and
  • archiving colder data from PureData to Hadoop to relieve resources on the data warehouse.

Managing your share of Big Data Growth

Fluid Query provides data access between Hadoop and PureData appliances. Your current data warehouse, the PureData System for Analytics, can be extended in several important ways over this bridge to additional Hadoop capabilities. The coexistence of PureData appliances alongside Hadoop’s beneficial features is a best-of-breed approach where tasks are performed on the platform best suited for that workload. Use the PureData warehouse for production quality analytics where performance is critical to the success of your business, while simultaneously using Hadoop to discover the inherent value of full-volume data sources.

How does Fluid Query differ from IBM BigSQL technology?

Just as IBM PureData System for Analytics innovated by moving analytics to the data, IBM Big SQL moves queries to the correct data store. IBM Big SQL supports query federation to many data sources, including (but not limited to) IBM PureData System for Analytics; DB2 for Linux, UNIX and Windows database software; IBM PureData System for Operational Analytics; dashDB, Teradata, and Oracle. This allows users to send distributed requests to multiple data sources within a single SQL statement. IBM Big SQL is a feature included with IBM BigInsights for Apache Hadoop which is an included software entitlement with IBM PureData System for Analytics. By contrast, many Hadoop and database vendors rely on significant data movement just to resolve query requests—a practice that can be time consuming and inefficient.

Learn more

Since March 27, 2015, IBM® Fluid Query 1.0 has been generally available as a software addition to PureData System for Analytics customers. If you want to understand how to take advantage of IBM® Fluid Query 1.0 check out these two sources: the on-demand webcast, Virtual Enzee – The Logical Data Warehouse, Hadoop and PureData System for Analytics , and the IBM Fluid Query solution brief. Update: Learn about Fluid Query 1.5, announced July, 2015.

About Rich,

Rich HughesRich Hughes is an IBM Marketing Program Manager for Data Warehousing.  Hughes has worked in a variety of Information Technology, Data Warehousing, and Big Data jobs, and has been with IBM since 2004.  Hughes earned a Bachelor’s degree from Kansas University, and a Master’s degree in Computer Science from Kansas State University.  Writing about the original Dream Team, Hughes authored a book on the 1936 US Olympic basketball team, a squad composed of oil refinery laborers and film industry stage hands. You can follow him on Twitter: @rhughes134

Footnote:
1 “How Much Data is Out There” by Webopedia Staff, Webopedia.com, March 3, 2014.

Fluid doesn’t just describe your coffee anymore … Introducing IBM Fluid Query 1.0

by Wendy Lucas

Having grown up in the world of data and analytics, I long for the days when our goal was to create a single version of the truth. Remember  when data architecture diagrams showed source systems flowing through ETL, into a centralized data warehouse and then out to business intelligence applications? Wow, that was nice and simple, right – at least conceptually? As a consultant, I can still remember advising clients and helping them to pictorially represent this reference architecture. It was a pretty simple picture, but that was also a long time ago.

While IT organizations struggled with data integration, enterprise data models and producing the single source of the truth, the lines of business grew impatient and would build their own data marts (or data silos).  We can think of this as the first signs of the requirement for user self-service. The goal behind building the consolidated, enterprise, single version of the truth never went away. Sure, we still want the ability to drive more accurate decision-making, deliver consistent reporting, meet regulatory requirements, etc. However, the ability to achieve this goal became very difficult as requirements for user self-service, increased agility, new data types, lower cost solutions, better business insight and faster time to value became more important.

Recognizing the Logical Data Warehouse

Enterprises have developed collections of data assets that each provide value for specific workloads and purposes. This includes data warehouses, data marts, operational data stores and Hadoop data stores to name a few. It is really this collection of data assets that now serves as the foundation for driving analytics, fulfilling the purpose of the data warehouse within the architecture. The Logical Data Warehouse or LDW is a term we use to describe the collection of data assets that make up the data warehouse environment, recognizing that the data warehouse is no longer just a single entity. Each data store within the Logical Data Warehouse can be built on a different platform, fit for the purpose of the workload and analytic requirements it serves.


Each data store within the Logical Data Warehouse can be built on a different platform, fit for the purpose of the workload and analytic requirements it serves.

But doesn’t this go against the single version of the truth? The LDW will still struggle to deliver on the goal behind the single version of the truth, if it doesn’t have information governance, common metadata and data integration practices in place. This is a key concept. If you’re interested in more on this topic, check out a recent webcast by some of my colleagues on the “Five Pitfalls to Avoid in Your Data Warehouse Modernization Project: Making Data Work for You.”

Unifying data across the Logical Data Warehouse

Logically grouping separate data stores into the LDW does not necessarily make our lives easier. Assuming you have followed good information governance practices, you still have data stores in different places, perhaps on different platforms. Haven’t you just made your application developers and users lives, who want self-service, infinitely more difficult? Users need the ability to leverage data across these various data stores without having to worry about the complexity of where to find it, or re-writing their applications. And let’s not forget about the needs of IT. DBAs struggle to manage capacity and performance on data warehouses while listening to Hadoop administrators brag about the seemingly endless, lower cost storage and ability to manage new data types that they can provide. What if we could have the best of all worlds? Provide seamless access to data across a variety of stores, formats, and platforms. Provide capability for IT to manage Hadoop and Data Warehouses along-side each other in a way that leverages the strengths of both.

Introducing IBM Fluid Query

IBM Fluid Query is the capability to unify data across the Logical Data Warehouse, providing the ability to seamlessly access data in it’s various forms and locations. No matter where a user connects within the logical data warehouse, users have access to all data through the same, standard API/SQL/Analytics access. IBM Fluid Query powers the Logical Data Warehouse, giving users the ability to combine numerous types of data from various sources in a fast and agile manner to drive analytics and deeper insight, without worrying about connecting to multiple data stores, using different syntaxes or API’s or changing their application.

In its first release, IBM Fluid Query 1.0 will provide users of the IBM PureData System for Analytics the capability to access Hadoop data from their data warehouse and move data between Hadoop and PureData if needed. High performance is about moving the query to the data, not the data to the query. This provides extreme value to PureData users who want the ability to merge data from their structured data warehouse with Hadoop for powerful analytic combinations, or more in-depth analysis. IBM Fluid Query 1.0 is part of a toolkit within Netezza Platform Software (NPS) on the appliance so it’s free for all PureData System for Analytics customers.


IBM Fluid Query 1.0 will provide users of the IBM PureData System for Analytics the capability to access Hadoop data from their data warehouse and move data between Hadoop and PureData

For Hadoop users, IBM also provides IBM Big SQL which delivers Fluid Query capability. Big SQL provides the ability to run queries on a variety of data stores, including PureData System for Analytics, DB2 and many others from your IBM BigInsights Hadoop environment. Big SQL has the ability to push the query to the data store and return the result to Hadoop without moving all the data across the network. Other Hadoop vendors provide the ability to write queries like this but they move all the data back to Hadoop before filtering, applying predicates, joining, etc. In the world of big data, can you really afford to move lots of data around to meet the queries that need it?

IBM Fluid Query 1.0 is generally available on March 27 as a software addition to PureData System for Analytics customers. If you are an existing customer and want to understand how to take advantage of IBM Fluid Query 1.0 or if you just would like more information, I encourage you to listen to this on-demand webcast: Virtual Enzee – The Logical Data Warehouse, Hadoop and PureData System for Analytics  and check out the solution brief. Or if you are an existing PureData System for Analytics customer, download this software. Update: Learn about Fluid Query 1.5, announced July, 2015.

About Wendy,

Wendy LucasWendy Lucas is a Program Director for IBM Data Warehouse Marketing. Wendy has over 20 years of experience in data warehousing and business intelligence solutions, including 12 years at IBM. She has helped clients in a variety of roles, including application development, management consulting, project management, technical sales management and marketing. Wendy holds a Bachelor of Science in Computer Science from Capital University and you can follow her on Twitter at @wlucas001