IBM Fluid Query: Extending Insights Across More Data Stores

by Rich Hughes

Since its announcement in March, 2015, IBM Fluid Query has opened the door to better business insights for IBM PureData System for Analytics clients. Our clients have wanted and needed accessibility across a wide variety of data stores including Apache Hadoop with its unstructured stores, which is one of the key reasons for the massive growth in data volumes. There is also valuable data in other types of stores including relational databases that are often “systems of record” and “systems of insight”. Plus, Apache Spark is entering the picture as an up-and-coming engine for real-time analytics and machine learning.

IBM is pleased to announce IBM Fluid Query 1.5 to provide seamless integration with these additional data stores—making it even easier to get deeper insights from even more data.

IBM Fluid Query 1.5 – What is it?

IBM Fluid Query 1.5 provides access to data in other data stores from IBM PureData System for Analytics appliances. Starting with Fluid Query 1.0, users were able to query and quickly move data between Hadoop and IBM PureData System for Analytics appliances. This capability covered IBM BigInsights for ApacheHadoop, Cloudera, and Hortonworks.

Now with Fluid Query 1.5, we add the ability to reach into even more data stores including Spark and such popular relational database management systems as:

  • DB2 for Linux, UNIX and Windows
  • dashDB
  • PureData System for Operational Analytics
  • Oracle Database
  • Other PureData System for Analytics implementations

Fluid Query is able to direct queries from PureData System for Analytics database tables to all of these different data sources and get just the results back—thus creating a powerful analytic capability.

IBM Fluid Query Benefits

IBM Fluid Query offers two key benefits. First, it makes practical use of data stores and lets users access them with their existing SQL skills. Workbench tools yield productivity gains as SQL remains the query language of choice when PureData System for Analytics and Hadoop schemas logically merge. IBM Fluid Query provides the physical bridge over which a query is pushed efficiently to where the data needed for that query resides—whether in the same data warehouse, another data warehouse a relational or transactional database, Hadoop or Spark.

Second, IBM Fluid Query enables archiving and capacity management on PureData-based data warehouses. With Fluid Query, users gain:

  • better exploitation of Hadoop as a “Day 0” archive that is queryable with conventional SQL;
  • capabilities to make use of data in a Spark in-memory analytics engine
  • the ability to easily combine hot data from PureData with colder data from Hadoop;
  • data warehouse resource management with the ability to archive colder data from PureData to Hadoop to relieve resources on the data warehouse.

Managing your share of Big Data Growth

The design point for Fluid Query is that the query is moved to the data instead of bringing massive data volumes to the query. This is a best-of-breed approach where tasks are performed on the platform best suited for that workload.

For example, use the PureData System for Analytics data warehouse for production quality analytics where performance is critical to the success of your business, while simultaneously using Hadoop or Spark to discover the inherent value of those full-volume data sources. Or, create powerful analytic combinations across data in other operational systems or analytics warehouses with PureData stores without having to move and integrate data before analyzing it.

IBM Fluid Query 1.5 is now generally available as a software addition to PureData System for Analytics clients. If you want to understand how to take advantage of IBM® Fluid Query 1.5, check out these resources:

About Rich,

Rich HughesRich Hughes is an IBM Marketing Program Manager for Data Warehousing.  Hughes has worked in a variety of Information Technology, Data Warehousing, and Big Data jobs, and has been with IBM since 2004.  Hughes earned a Bachelor’s degree from Kansas University, and a Master’s degree in Computer Science from Kansas State University.  Writing about the original Dream Team, Hughes authored a book on the 1936 US Olympic basketball team, a squad composed of oil refinery laborers and film industry stage hands. You can follow him on Twitter: @rhughes134

Advertisements

IBM DB2 Analytics Accelerator: OLTP and OLAP in the same system at last! (Part two)

In my previous post, I introduced IBM DB2 Analytics Accelerator (Accelerator) and explained how it is capable of serving online analytical processing (OLAP) and online transaction processing (OLTP) queries at the same time. Now it is time to go into more detail and explain how it all works.

The DB2 optimizer is the key

The design concept is quite simple. The DB2 for z/OS optimizer is aware of DB2 Analytics Accelerator’s existence in a given environment and can execute a given query either on the DB2 Analytics Accelerator or by using the already well-known access paths within DB2 for z/OS. The DB2 optimizer decides which queries to direct to the Accelerator for hardware-accelerated parallel query processing, thus the Accelerator is essentially transparent to the applications and reporting tools querying DB2 for z/OS.

Deep DB2 Integration with z Systems

So, the DB2 optimizer determines whether a query is best suited to run utilizing symmetric multiprocessing (SMP), leveraging DB2 for z/OS; or the hardware-accelerated massively parallel processing (MPP) architecture delivered by the PureData System for Analytics, powered by Netezza technology. It chooses the best option transparently. There are essentially no changes to the way you query the DB2 system.

Query Execution Process Flow

After defining tables to be accelerated and then loading data into the Accelerator, you essentially have an environment that provides both an SMP and MPP “personality” to the same table. This allows you to leverage the legendary DB2 for z/OS qualities of service and performance for transactional queries, as well as Netezza performance for complex queries.

Complex queries are automatically routed by DB2 for processing by the Asymmetric Massively Parallel Processing (AMPP) engine, an architecture that uses special-purpose hardware accelerators (a multi-engine field-programmable gate array [FPGA]). These accelerators decompress and filter data for relevance to the query before it is loaded into memory and given to the processor for any necessary aggregation and final processing. This entire process is transparent to the application and user. The user only notices an improvement in the speed at which the query is resolved by the system, specifically the analytical ones. The system in general also benefits, because the analytical load is transferred out of DB2, and thus there is more capacity available to serve additional transactional queries. And keep this in mind: once a query is derived to be solved by the Accelerator, its execution does not consume MIPS.

High Performance Storage Saver

Once you have this system installed, you may use it for more than accelerating analytical queries. There are some interesting use cases that will help you obtain more benefits from your hybrid system, such as the High Performance Storage Saver.

High Performance Storage Saver: Store your historical data outside DB2 for z/OS, but maintain high speed analytical access to it.
High Performance Storage Saver: Store your historical data outside DB2 for z/OS,
but maintain high speed analytical access to it.

In this use case, you can save in Total Cost of Ownership (TCO) by using the Accelerator to store historical data to reduce storage costs on z Systems. This way, you are not consuming storage in your IBM z Systems. Instead, you are using the lower cost of storage in the DB2 Analytics Accelerator, while still being able to access that historical data and perform analysis on it. And this is the most interesting point . . . the historical data that you would typically take out of your mainframe due to the cost of storage in that system can now be explored online. You do not need to recover the tapes where you store the historical data to access it.

Simple is still better

Meanwhile, I encourage you to learn more about DB2 Analytics Accelerator, where you are leveraging the “simple is still better” design tenents of PureData System for Analytics And of course, I am more than happy to read about your opinions and see you share your experiences in the comments area below or join me in a conversation on Twitter.

See additional posts

IBM DB2 Analytics Accelerator: OLTP and OLAP in the same system at last! (Part one)

About Isaac,

Isaac Moreno NavarroIsaac is working as a data warehouse and Big Data technical pre-sales professional for IBM, covering customers in Spain and Portugal, where his special focus is on PureData for Analytics. He joined IBM in 2011, through the Netezza acquisition. Before that, he has held several positions in pre-sales and professional services in companies such as Oracle, Sun Microsystems, Netezza and other Spanish companies. During the years previous to working at IBM, he has acquired a diverse experience with different software tools (databases, identity management products, geographical information systems, manufacturing systems…) in a very diverse set of projects. He also holds a Master of Science Degree in Computer Science.

Making faster decisions at the point of engagement with IBM PureData System for Operational Analytics

by Rahul Agarwal

The need for operational analytics
Today, businesses across the world face challenges dealing with the increasing cost and complexity of IT, as they cope with the growing volume, velocity and diversity of information. However, organizations realize that they must capitalize on this information through the smart use of analytics to meet emerging challenges and uncover new business opportunities.

… analytics needs to change from a predominantly back-office activity for a handful of experts to something that can provide pervasive, predictive, near-real-time information for front-line decision makers.

One thing that is increasingly becoming clear is that analytics is most valuable when it empowers individuals throughout the organization. Therefore, analytics needs to change from a pre-dominantly back-office activity for a handful of experts to something that can provide pervasive, predictive, near-real-time information for front-line decision makers.

Low latency analytics on transactional data, or operational analytics, provide actionable insight at point of engagement, giving organizations the opportunity to deliver impactful and engaging services faster than their competition. So what should one look for in an operational analytics system?

Technical capabilities
A high percentage of queries to ‘operational analytics’ systems—often up to 80% — are interactive lookups that are focused on data about a specific customer, account or patient. To deliver the correct information as rapidly as possible, systems must be optimized for the right balance of analytics performance and operational query throughput.

… systems must be optimized for the right balance of analytics performance and operational query throughput.

IT requirements
In order to maximize the benefits of operational analytics, one needs a solution that will quickly deliver value, better performance, scale and efficiency – while reducing the need for IT experts who design, integrate and maintain IT systems. In addition, one should look for a system, which comes with deep levels of optimization to achieve the desired scale, performance, and service quality, since assembling the right skills to optimize these systems is a costly and often difficult endeavour.

Flexibility
The ideal system should provide analytic capabilities to deliver rapid and compelling return on investment now; and this system must grow to meet new demands so that it remains as relevant and powerful in the future as it is today. In addition, the system should have the flexibility to meet these demands without disrupting the free-flow of decision support intelligence to the individuals and applications driving the business.

IBM PureData System for Operational Analytics
The IBM PureData System for Operational Analytics helps organizations meet these complex requirements with an expert integrated data system that is designed and optimized specifically for the demands of an operational analytics workload.
Built on IBM POWER Systems servers with IBM System Storage and powered by IBM DB2 software, the system is a complete solution for operational analytics that provides both the simplicity of an appliance and the flexibility of a custom solution. The system has recently been refreshed with latest technology that will help customers to make faster, fact-based decisions ¬and now offers:

  • Accelerated performance with the help of new, more powerful servers that leverage POWER8 technology and improved tiered storage which uses spinning disks for ‘cool’ data and IBM FlashSystemTM storage for the ‘hot’ or frequently accessed data.
  • Enhanced scalability that allows the system to grow to peta-scale capacity. In addition, nodes of the refreshed system can be added to previous generation of PureData System for Operational Analytics thus providing better protection for your technology investment.
  • A reduced data center footprint as a result of increased hardware density.

So explore the benefits and use cases of PureData System for Operational Analytics by visiting our website, ibm.com/software/data/puredata/operationalanalytics as well as connecting with IBM experts.

About Rahul Agarwal

Rahul AgarwalRahul Agarwal is a member of the worldwide product marketing team at IBM that focuses on data warehouse and database technology. Rahul has held a variety of business management, product marketing, and other roles in other companies including HCL Technologies and HP before joining IBM.  Rahul studied at the Indian Institute of Management, Kozhikode and holds a bachelor of engineering (electronics) degree from the University of Pune, India. Rahul’s Twitter handle :  @rahulag80


 

Leveraging In-Memory Computing For Fast Insights

By Louis T Cherian,

It is common knowledge that an in-memory database is fast, but what if you had an even faster solution?
Think of a next generation in-memory database, which is

  • Faster, with speed of thought analytics to get insights
  • Simpler, with reduced complexity and improved performance
  • Agile, with multiple deployment options and low risk for migration
  • Competitive, by delivering products to market much faster

We are talking about combination of innovations that make IBM BLU Acceleration, the next generation in-memory solution.

So, what really goes into making IBM BLU Acceleration, the next generation in-memory solution?

  • The in-chip analytics allows the data to flow through the CPU very quickly, making it faster than “conventional” in-memory solutions
  • With actionable compression, one can perform a broad range of operations on data, while it is still compressed
  • With data skipping, any data that is not needed to be touched to answer a query is skipped over and that results in dramatic performance improvements
  • The ability to run all operational reports on transactional data as it is captured with the help of shadow tables ,  arguably  the most notable feature in the  DB2 10.5 “Cancun Release”

To know more about leveraging in-memory computing for fast insights with IBM BLU Acceleration, watch this video: http://bit.ly/1BZq1lo

For more information, visit : http://www-01.ibm.com/software/data/data-management-and-data-warehousing/dw.html

About Louis T. Cherian,

Louis T. Cherian is currently a member of the worldwide product marketing team at IBM that focuses on data warehouse and database technology. Prior to this role, Louis has held a variety of product marketing roles within IBM, and in Tata Consultancy Services, prior to joining IBM.  Louis has done his PGDBM from Xavier Institute of Management and Entrepreneurship, and also has an engineering degree in computer science from VTU Bangalore.

Self-Service Analytics, Data Warehousing, and Information Management in the Cloud

By James Kobielus,

As we approach IBM Insight 2014, we call your attention to IBM Watson Analytics. Announced last month, Watson Analytics will go into public beta in mid-November, not long after Insight.

Whether or not you plan to attend this year’s event in Las Vegas, we invite you to participate in the upcoming public beta, which we strongly believe you’ll find transformative. What IBM has done is to reinvent and thereby democratize the business-analytics experience for the cloud era.

With Watson Analytics, which you can try for yourselves at Insight, IBM has put the power of sophisticated visual, predictive and cognitive analytics directly into the hands of the any users, even the least technically inclined. With a “freemium” option that will be a permanent element of the service upon full launch, you will be able to gain no-cost, on-demand, self-service access to sophisticated analytical capabilities. Marketing, sales, operations, finance and HR professionals can gain answers they need from all types of data–without needing to enlist a professional data scientist in the effort.

Watson Analytics’ built-in capabilities for advanced data management ensure that data is accessible rapidly and that large volumes of data are handled with ease, utilizing an embedded cloud data warehouse that incorporates IBM’s sophisticated DB2 with BLU Acceleration in-memory/columnar technology. In addition, embedded data refinery services enable business people, without any reliance on IT, to quickly find relevant, easily consumable raw data and transform that into relevant and actionable information.

As an added incentive for attending Insight, IBM will make further announcements that extend the value of Watson Analytics and of the sophisticated cloud data-warehousing and data-refinement services that power this supremely accessible and useful analytic experience. With this forthcoming announcement on cloud data warehousing, IBM continues to change the experience of using analytics today for our clients. We are making it easier for clients to be data-driven organizations and take advantage of new opportunities faster.

We hope to meet you at Insight!

About James, 

James Kobielus is IBM Senior Program Director, Product Marketing, Big Data Analytics solutions. He is an industry veteran, a popular speaker and social media participant, and a thought leader in big data, Hadoop, enterprise data warehousing, advanced analytics, business intelligence, data management, and next best action technologies. Follow James on Twitter : @jameskobielus

Data Warehousing and Analytics in the Cloud — A Guide for Insight Attendees

By Adam Ronthal, 

IBM Insight, the premier conference for Big Data and Analytics is just around the corner, and sessions will include material on all of our data management solutions including DB2 with BLU Acceleration, and of course the PureData System for Analytics based on Netezza technology (which incidentally, just released a new version, the N3001!).

Warehousing and Analytics in the Cloud, however, is a horse of a different color.  Yes, it’s still data warehousing, and yes, it’s still analytics, but it differs from traditional on-premises solutions in several key ways:

  • Cloud agility means rapid provisioning (think hours, not days)
  • Pay-as-you-go models mean a shift to operational expense rather than capital expense
  • “as a Service” means that end-users don’t worry about infrastructure, but can focus on business applications and problems

In short, cloud lets you focus on the business, rather than the business of IT, which is a very powerful message.

Almost everyone who is considering cloud already has on-premises systems, of course, so it is critical that a cloud-based solution play well not only for born-in-the-cloud applications, but also with the existing ground-based solutions we all know and love.  And that’s the promise of a properly architected and well thought out cloud based service for warehousing and analytics — portability!  The ability to use the same applications, tools, and analytic algorithms on the ground or in the cloud is what enables hybrid flexibility.

We used to look at the logical data warehouse as comprising both traditional structured database technologies and newer NoSQL technologies like Hadoop and streaming computing.   Now we are extending that to include new deployment options as well.  Databases and NoSQL, ground to cloud, all treated as a logical cohesive whole.

Come find out more about our exciting new Data Warehouse and Analytics as a Service offerings at Insight:

Elective Sessions

  • IWM-4857A: Spotlight Session: Modernizing Your Data Warehouse for Big Data & Bigger Results (Monday, 2-3 PM in South Seas F)
  • FTC-4285A: Data Warehousing and Analytics in the Cloud: IBM’s New Data Warehousing Service (Tuesday, 3-4PM in Islander E)
  • IWM-4637A: Advanced Warehouse Analytics in the Cloud (Monday 3:30-4:30 in Jasmine C)
  • IDB-6062A: Data Warehousing in the Cloud – a practical deployment guide (Wed, 10-11AM in South Seas C)
  • IWS-6952A: Enzee Universe Part 2: Business Update and Product Strategy (Sunday 1-6pm in South Seas F)
  • IWS-7043A: Expert Exchange: Data Warehousing & Analytics in the Cloud (Tuesday, 10-11AM in Banyan B)

I’ll be presenting at some of these, but present at all of them (time permitting) at IBM Insight 2014, so come find me in Vegas and we can catch up!

About Adam,

Adam Ronthal has worked in the technology industry for 20 years in technical operations, system administration, and data warehousing and analytics. In 2006, Adam joined Netezza as a Technical Account Manager, working with some of IBM Netezza’s largest data warehousing and analytic customers and helping them architect and implement their Netezza-based solutions. Today, Adam works in technical marketing for the IBM’s big data, cloud, and appliance offerings. Adam is an IBM Certified Specialist for Netezza, and holds a BA from Yale University. Follow Adam on twitter @ARonthal

Governance, Stewardship, and Quality of Temporal Data in a Data Warehousing Context

By James Kobielus, 

Organizations must hold people accountable for their actions, and that depends on having the right data, tools, and processes for keeping track of the precise sequence of events over time.

Timing is everything when you’re trying to pinpoint the parties who are personally responsible in any business context. Consequently, time-series discovery is the core task of any good investigator, be they Sherlock Holmes or his real-world counterparts in the hunt for perps and other parties of interest.

Audit trails are essential for time-series discovery in legal proceedings, and they support equivalent functions in compliance, security, and other business application contexts. Audit trails must describe correctly and consistently the prior sequence of events, so that organizations can identify precisely who took what actions when under which circumstances.

To help identify the responsible parties on specific actions, decisions, and outcomes, the best audit trails should, at minimum, support longitudinal analysis, which rolls up records into a comprehensive view of the entire sequence of events. But the databases where the audit trails are stored should also support time-validity analysis, which rolls back time itself to show the exact state of all the valid data available to the each responsible party at the times they made their decisions. Without the former, you can’t fit each event into the larger narrative of what transpired. Without the latter, you can’t fit each event into the narrative of who should be punished or exonerated.

All of that requires strong data quality, which relies, in turn, on having access to databases and tools that facilitate the requisite governance and stewardship procedures. Data warehouses are where you should be keeping your system-of-record data to support time-series analyses. Consequently, temporal data management is an intrinsic feature of any mature data warehousing, governance, and stewardship practice. Indeed, the ability to traverse data over time is at the very heart of the concept of data warehousing, as defined long ago by Bill Inmon: “a subject-oriented, nonvolatile, integrated, time-variant collection of data in support of management’s decisions.”

Many organizations have deployed transactional databases such as IBM DB2 for data warehousing and temporal data management. If they use a high-performance implementation, such as DB2 with BLU Acceleration software running on IBM POWER8 processors, they can do in-memory time-series analyses of large audit trails with astonishing speed. If you want further depth on DB2’s native temporal data management features, I strongly recommend this technical article.

Temporal data management concepts may be unfamiliar to some data warehousing professionals. Here’s a good recent article providing a good primer on temporal database computing. As the author states, “A temporal database will show you the actual value back then, as it was known back then, and the actual value back then, as it is known now.”

These concepts are a bit tricky to explain clearly, but I’ll take a shot. The “actual value back then, as it is known now” is the “valid time,” and may be updated or corrected within a temporal database if the previously-believed valid time is found to have been in error. The “actual value back then, as it was known back then” is the “transaction time”; it remains unchanged and may diverge from the “valid time” as the latter is corrected.

Essentially, this arrangement enables the record of any historical data to be corrected at any time in the future. It also preserves the record, for each point in the past, of that moment’s own erroneous picture of the even deeper past. This gets to the heart of the “what they knew and when they knew it” heart of personal responsibility.

As I was reading this recent article that discusses time-series data in an Internet of Things (IoT) context, the association of temporality with personal responsibility came into new focus. What if, through IoT, we were able to save every last datum that each individual person produced, accessed, viewed, owned, or otherwise came into contact with at each point in time? And what if we could roll it back to infer what they “knew” and “when they knew it” on a second-by-second basis?

This is not a far-fetched scenario. As the IoT gains ubiquity in our lives, will make this a very realistic scenario (for the moment, let’s overlook the staggering big-data management and analytics challenges that this would entail). And as this temporal data gets correlated with geospatial, social, and other data sources–and mined through data lineage tools–it will make it possible to rollup high-resolution, 360-degree portraits of personal responsibility. We’ll have a full audit trail of exactly who knew (individually and collectively) what when, where, how, why, and with what consequences.

Whether you’re a prosecuting attorney building a case, a law-enforcement official searching trying to uncover terrorist plots in the nick of time, or an IT security administrator trying to finger the shadowy perpetrators of a hack attack, these IoT-infused discovery tools will prove addictive.

The effectiveness of governance in the modern world will depend on our ability to maintain the requisite audit trails in whatever data warehouse or other well-governed repository best suits your operational requirements.

About James, 

James Kobielus is IBM Senior Program Director, Product Marketing, Big Data Analytics solutions. He is an industry veteran, a popular speaker and social media participant, and a thought leader in big data, Hadoop, enterprise data warehousing, advanced analytics, business intelligence, data management, and next best action technologies. Follow James on Twitter : @jameskobielus