IBM’s Continued Leadership in the Data Warehousing Market

By James Kobielus

It’s good to know that some things remain constant in the ever-changing data warehousing (DW) market. One of those constants is IBM’s continued leadership in this segment, which, as I stated in my recent IBM Big Data & Analytics (BD&A) Hub blog, is as relevant as ever in the era of big data. Even as data industry fads change with seasons, you will still need strong analytics data governance and a repository for master data, and that’s your DW.

IBM has maintained and deepened our leadership in the DW market over many years, and, as Wendy Lucas discusses in her BD&A Hub blog, an independent analyst firm continues to recognize that fact. “Regardless of your specific DW requirements,” she states, “it’s as important as ever to partner with a vendor that has the proven breadth and depth of solutions to fit each of these needs. Gartner cites IBM’s ‘broad offering and integration across products that can support all four major data warehouse use cases’ as a strength, along with our continued investment in product innovation driven by customer and market demands.”

I’d like to highlight the word “innovation” in what Wendy says. Just because the DW market is of long vintage and has many mature offerings, such as IBM PureData System for Analytics (PDA), doesn’t mean that we solution providers are slacking off on new features. Go check out an important new feature, IBM Fluid Query, which provides a query and data movement toolkit for leveraging insights from data in PDA and various Hadoop platforms, including IBM InfoSphere BigInsights.

The DW innovations don’t stop there. In Dennis Duckworth’s commentary on the Gartner study, he discusses IBM’s ongoing investments in logical DW solutions that can be deployed as hybrid clouds, or as with dashDB, as full-blown SaaS offerings. As he notes, one of the important innovations in dashDB is built-in integration with our NoSQL database-as-a-service, Cloudant.

All of these IBM investments serve to further the evolution of the logical DW as a more fluid, agile, and versatile enterprise analytics platform.

About James,

JAMES KOBIELUSJames Kobielus is IBM Senior Program Director, Product Marketing, Big Data Analytics solutions. He is an industry veteran, a popular speaker and social media participant, and a thought leader in big data, Hadoop, enterprise data warehousing, advanced analytics, business intelligence, data management, and next best action technologies. Follow James on Twitter: @jameskobielus

IBM named a Leader in the Latest Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics

By Dennis Duckworth

Note: This blog represents my opinion about what is said in the Gartner Magic Quadrant and does not mean to imply that it is what Gartner intended. You can read the report here to see what Gartner said.

In the most recently released Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics (Published 12 February 2015; Analysts: Mark A. Beyer, Roxane Edjali), Gartner placed IBM in the Leaders Quadrant.

If you are a follower of that particular Magic Quadrant, you will know that IBM has been listed as a leader every year back to the first one in 2007 (I’ll leave it as an exercise for the reader to Google search for all the graph images from all those previous years).)

There were a number of interesting things I took away from reading the latest Magic Quadrant. The first thing I noticed even before opening it was the change in the title. In previous years, this report had been called the “Magic Quadrant for Data Warehouse Database Management Systems”. For me this slight adjustment in naming reflects two recent market trends: first, that analytics is the real business driver for these solutions and second, through the addition of that little word “and” in the name makes it clearer that not all data management solutions for analytics are data warehouses, or at least traditional data warehouses. In 2014, Gartner for the first time included non-relational data management systems (HDFS, key-value stores, document stores, etc.) and that inclusion continued in the new report. This reflects another truth in the data warehouse world – the relational data warehouse isn’t sufficient for the analytics needed in enterprises today.

This reflects another truth in the data warehouse world – the relational data warehouse isn’t sufficient for the analytics needed in enterprises today.

Gartner continues to highlight the Logical Data Warehouse as a key use case for data warehouses and data management solutions. This use case is consistent with what we see our clients doing and wanting to do – they want to have one overall data management architecture in which to be able to land, cleanse, manage, transform, explore, govern, and analyze all their data. That means getting their enterprise data warehouse to play nice with their tactical data marts, operational data stores, Hadoop data reservoirs and real-time streaming systems—all with centralized, uniform data governance and security. In her blog on this topic, Wendy Lucas addresses a reaction we get  all the time, “Does this sound complex? It doesn’t need to be excessively complicated if you deploy the solutions that have been optimized for your specific types of data, data-processing latencies and analytic needs.”

If you have been reading the previous blogs (see a list below), you have seen that IBM is a believer in the power of the Logical Data Warehouse approach for analytics. We have talked about our zone architecture (using separate best-of-breed platforms for the best performance on different types of data or for different analytic requirements). We announced new functionality like IBM Fluid Query for PureData System for Analytics that allows very tight integration between PureData relational data warehouse stores and Hadoop stores. This allows you to run queries on PureData System for Analytics that have the ability to reach over and run in Hadoop with those results being automatically incorporated into the overall query results. That means you are able to send queries and return results rather than needing to ship raw data around, which is much more efficient.

We continue to add more products and more capabilities to existing products to allow our clients to provide more of what is needed as part of the Logical Data Warehouse, including being able to offer Hybrid Cloud implementations. We added dashDB as a cloud-based alternative for high performance advanced analytics (with built-in integration to our Cloudant Database-as-a-Service). We continue to improve our BigInsights for Hadoop and our IBM Streams offerings, both of which have had recent v4.0 releases. We also continue to improve our Integration and Information Governance solutions to be able to manage all of the data that is part of the Logical or Hybrid architecture.

We continue to add more products and more capabilities to existing products to allow our clients to provide more of what is needed as part of the Logical Data Warehouse, including being able to offer Hybrid Cloud implementations.

We do this all to be able to provide our customers with what they need to stay ahead of their ever changing and growing data and analytics needs. Gartner’s Magic Quadrant provides a good validation for us that we are succeeding in that effort.

Read additional blogs

About Dennis,

Dennis Duckworth, Program Director of Product Marketing, IBM AnalyticsDennis has been in the data game for quite a while, doing everything from Lisp programming in artificial intelligence to managing a sales territory for an RDBMS company. His passion is helping companies and people get real value out of cool technology. He is currently contributing to IBM efforts to create a unified comprehensive analytics framework across its entire Big Data platform family. In his previous role, Dennis was Director of Competitive and Market Intelligence for Netezza. He holds a degree in Electrical Engineering from Stanford University but has spent most of his life on the East Coast. When not working, Dennis enjoys sailing and fishing off his backyard on Buzzards Bay and he remains vigilant in his quest for wine enlightenment.

About The Magic Quadrant

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Getting Started with IBM Fluid Query 1.0 for IBM PureData System for Analytics

By Doug Dailey

As Big Data concepts continue to mature and evolve, so does the technology that encourages its adoption. Enterprises are looking at ways to better leverage their data by reducing costs and positioning data for success based on its relevance. The yield for this exercise delivers optimum insights for the business at the right time.

Many are finding that Hadoop is not the answer for all of their data needs. They want to have access to various systems, rather than choosing a “one size fits all” mentality. Enterprise Data Warehouse (EDW), relational, content stores, real-time in-memory processing and more all have their place. We have seen an increasing number of software tools, specialized hardware products and services that work to bridge the gap between approaches to store or analyze the data.

Fluid Query Strengths – Query access and Data Movement with Hadoop

IBM introduced Fluid Query 1.0 for use on PureData System for Analytics in March. The capability allows PureData users to turn their EDW on its end and work as a client. Traditionally, EDW environments served as landing zones for high value data to explore, analyze and gain speed of thought insights from complex in-database algorithms. Now, IBM Fluid Query allows PureData users to access data residing on Hadoop distributions as if they are a client. This does not move and store data locally, but actually pushes SQL down to Hadoop offload processing via Map Reduce jobs. Now, you can query directly from Hadoop and move data natively between PureData and Hadoop in parallel.

Are you interested in doing any of the following?

● Query Hadoop data from your PureData System for Analytics
● Bi-directional data transfer between PureData and Hadoop (BigInsights, Hortonworks or Cloudera)
● Move data between PureData and Hadoop in parallel
● Full control over tables and data ranges queried or transferred
● Automatic registration with Hive meta-store

How to Get Started

Customers have been able to download, install, configure and test Fluid Query in less than 30 minutes. This is a perfect lunch hour activity for inquiring minds. Just be sure that your Hadoop and PureData environment have the needed prerequisites in place. This will run on PureData System for Analytics N100x, N2001, N2002, and N3001.

IBM Fluid Query Minimum System Requirements

 

 

 

 

 

 

 

Tools needed for installation:

(1) Supported Hadoop distribution installed, up & running

supported hadoop providers

(2) Active network connection and user access/authentication between PureData and Hadoop

(3) PureData installed with Netezza Analytics

(4) Data available for use

Downloading and installing Fluid Query:

1. Download FLUIDQUERY_1.0 tar package from Fix Central
http://www-933.ibm.com/support/fixcentral/

Download IBM Fluid Query

 

 

 

 

 

 

 

 

 

 

2. The IBM Fluid Query User Guide can be found here for more details on setup and configuration.

3. Unpack the FluidQuery_1.0 bundle and run the fluidquery_install.pl script.

4. Configure Fluid Query for use, then query and move data to your heart’s content. This is comprised of a lightweight configuration, registration of user defined table functions, and view creation.

Finally, use your favorite tool to execute your Hadoop query and view results.

IBM Fluid Query screen shot 2

 

In keeping with the simplicity and ease of use of Netezza technology, we have delivered a very lightweight set of capabilities that pack a load of value for your Logical Data Warehouse ecosystem. Whether you are trudging through a data swamp, or swimming in a data lake or reservoir, you can very easily reel in results important to your business.

Go to the IBM Fluid Query Solution Brief to learn more.

Update: Learn about Fluid Query 1.5 announced in July, 2015.

Doug Daily About Doug,
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

Things you need to know when switching from Oracle database to Netezza (Part 3)

by Andrey Vykhodtsev

In my previous two posts I covered the differences in architecture between IBM PureData System for Analytics and Oracle Database, as well as differences in SQL. (See below for links.) In this post, I am going to cover another important topic – additional structures that speed-up data access.

Partitions, Indexes, Materialized Views

Oracle database relies on Indexes, Partitions and Materialized views for performance. In Oracle, indexes are designed 19712947_s_blue data arrow backgroundto speed-up point searches or range searches that touch a very small percentage of the data. Because of the B-Tree index structure, if you touch a large percentage of the data, using the index will be much slower than the full scan of the whole table. If you have this problem, then you probably have decided to use partitioning. In Oracle, Partitioning is a paid feature that goes only with certain editions. You also have Materialized views with which you can put results of the complex queries on disk for later re-use. These structures are designed with general purpose (analytical processing + transactional processing ) in mind, and can be complex and unwieldy to maintain.

By contrast, with PureData you have fewer worries. The trade-off, as I said in my first post, is that PureData is not a general-purpose system, but rather an analytical-processing system.

We use ZoneMaps in PureData instead of indexes. In essence, a ZoneMap is just a table of minimum and maximum values for all columns that have certain types. ZoneMaps are extremely compact, and they don’t need to be created or maintained. But this is not all. ZoneMap filtering takes place at the hardware level. (Remember mention of FPGA, Field Programmable Gate Arrays in my first post?) The system will not scan data that does not need to be scanned for a particular query. Therefore I/O is greatly reduced. If you update data or delete data based on a condition, ZoneMaps also are taken into account.

Because of ZoneMaps, you don’t need to partition your data. ZoneMaps take advantage of the natural ordering of data. For example, if you insert data daily, ZoneMap on the date field will become completely sorted. Range searches on this field will be extremely fast.

In addition to ZoneMaps, there are couple of other techniques you can use to optimize query access to a certain table. First is called CBT, Clustered Based Table. This is not a separate structure that needs to be maintained, but rather an internal table organization method. If you choose a table to be CBT, you can provide up to 4 fields, on which you will have extremely fast searches.

The only additional structure that PureData has is called “Materialized View”, but this is a bit different concept than in Oracle. In PureData, materialized view is a subset of columns from one table that can be sorted differently than the base table, therefore speeding up access on the sorted columns. Because materialized views are ZoneMapped, they have some properties of the indices, but they are not actually indices. Materialized views might be needed if you have “tactical queries”, queries that require fast and frequent access to small portions of data. Otherwise, you don’t usually need them.

In Conclusion

As you see, in PureData it is much simpler to maintain efficient data access. Instead of creating and maintaining indexes for the subset of columns on each table, PureData automatically creates ZoneMaps for you. I know from experience what a nightmare index maintenance in a large data warehouse might be. Partitioning is another technique that is not needed in PureData. Instead of indexes and partitions, we use much simpler structures, that are automatically maintained, and applied on hardware level (in FPGA), with the speed of streaming data.
In  my next posts, I am going to cover a few more topics that you need to be aware of when migrating from Oracle to PDA. Please stay tuned, and follow me on Twitter: @vykhand

Other posts in this series

About Andrey,
Andrey VykhodtsevAndrey Vykhodtsev is Big Data Technical Sales Expert covering Central and Eastern Europe Region in IBM. He has more than 12 years of experience in Data Warehousing and Analytics, and has worked as senior data warehouse developer, analyst, architect, consultant in multiple industries, including Financial sector and Telecommunications.

Making faster decisions at the point of engagement with IBM PureData System for Operational Analytics

by Rahul Agarwal

The need for operational analytics
Today, businesses across the world face challenges dealing with the increasing cost and complexity of IT, as they cope with the growing volume, velocity and diversity of information. However, organizations realize that they must capitalize on this information through the smart use of analytics to meet emerging challenges and uncover new business opportunities.

… analytics needs to change from a predominantly back-office activity for a handful of experts to something that can provide pervasive, predictive, near-real-time information for front-line decision makers.

One thing that is increasingly becoming clear is that analytics is most valuable when it empowers individuals throughout the organization. Therefore, analytics needs to change from a pre-dominantly back-office activity for a handful of experts to something that can provide pervasive, predictive, near-real-time information for front-line decision makers.

Low latency analytics on transactional data, or operational analytics, provide actionable insight at point of engagement, giving organizations the opportunity to deliver impactful and engaging services faster than their competition. So what should one look for in an operational analytics system?

Technical capabilities
A high percentage of queries to ‘operational analytics’ systems—often up to 80% — are interactive lookups that are focused on data about a specific customer, account or patient. To deliver the correct information as rapidly as possible, systems must be optimized for the right balance of analytics performance and operational query throughput.

… systems must be optimized for the right balance of analytics performance and operational query throughput.

IT requirements
In order to maximize the benefits of operational analytics, one needs a solution that will quickly deliver value, better performance, scale and efficiency – while reducing the need for IT experts who design, integrate and maintain IT systems. In addition, one should look for a system, which comes with deep levels of optimization to achieve the desired scale, performance, and service quality, since assembling the right skills to optimize these systems is a costly and often difficult endeavour.

Flexibility
The ideal system should provide analytic capabilities to deliver rapid and compelling return on investment now; and this system must grow to meet new demands so that it remains as relevant and powerful in the future as it is today. In addition, the system should have the flexibility to meet these demands without disrupting the free-flow of decision support intelligence to the individuals and applications driving the business.

IBM PureData System for Operational Analytics
The IBM PureData System for Operational Analytics helps organizations meet these complex requirements with an expert integrated data system that is designed and optimized specifically for the demands of an operational analytics workload.
Built on IBM POWER Systems servers with IBM System Storage and powered by IBM DB2 software, the system is a complete solution for operational analytics that provides both the simplicity of an appliance and the flexibility of a custom solution. The system has recently been refreshed with latest technology that will help customers to make faster, fact-based decisions ¬and now offers:

  • Accelerated performance with the help of new, more powerful servers that leverage POWER8 technology and improved tiered storage which uses spinning disks for ‘cool’ data and IBM FlashSystemTM storage for the ‘hot’ or frequently accessed data.
  • Enhanced scalability that allows the system to grow to peta-scale capacity. In addition, nodes of the refreshed system can be added to previous generation of PureData System for Operational Analytics thus providing better protection for your technology investment.
  • A reduced data center footprint as a result of increased hardware density.

So explore the benefits and use cases of PureData System for Operational Analytics by visiting our website, ibm.com/software/data/puredata/operationalanalytics as well as connecting with IBM experts.

About Rahul Agarwal

Rahul AgarwalRahul Agarwal is a member of the worldwide product marketing team at IBM that focuses on data warehouse and database technology. Rahul has held a variety of business management, product marketing, and other roles in other companies including HCL Technologies and HP before joining IBM.  Rahul studied at the Indian Institute of Management, Kozhikode and holds a bachelor of engineering (electronics) degree from the University of Pune, India. Rahul’s Twitter handle :  @rahulag80