by Mona Patel
Interview with Mukta Singh
As big data analytics technologies such as Spark and Hadoop continue their move into the mainstream, you might think that the traditional data warehouse is becoming less important.
Actually, nothing could be further from the truth.
To enable data of all types to be ingested, transformed, processed and analyzed efficiently, many companies are choosing to build hybrid analytics architectures that plug cloud and open source technologies such as Spark and Hadoop into on-premises environments. At the heart of these hybrid architectures lies the data warehouse – a highly reliable resource that provides a single source of truth for enterprise reporting and analytics.
This raises an important question: since the data warehouse is so central to the hybrid analytics architecture, how can we make sure it performs well and cost-effectively?
Traditional wisdom is that the infrastructure doesn’t matter – that running these vital systems of record on commodity hardware is perfectly adequate. But when you look at the numbers, you may begin to question that view.
To understand why the right hardware – in this case, IBM Power Systems – can make a real difference, I spoke with Mukta Singh, Director of Data Warehousing at IBM. In my conversation with Mukta, we take a deeper dive into why IBM’s software-defined data warehouse – IBM dashDB Local – on IBM Power Systems offers a better price/performance ratio compared to commodity hardware.
Mona Patel: Can you tell our readers a little bit about the Power Architecture? What is so unique about it?
Mukta Singh: IBM Power Systems is the dominant server platform in today’s Unix market, with over 50 percent market share. It has also become a leading platform for Linux systems, and we have seen tremendous growth in that area in recent years.
Unlike commodity servers, which typically use x86 processors, Power servers use IBM’s Power Architecture, a unique processor architecture that has been designed specifically for big data and analytics workloads.
Mona Patel: How does IBM dashDB Local integrate with Power Systems?
Mukta Singh: dashDB Local is a software-defined data warehouse offering that has been optimized for rapid deployment and ease of management. Essentially, the system runs in a Docker container, which means it can be flexibly deployed on different types of hardware either on-premises or in a private or public cloud environment.
One of the options today is to deploy your dashDB Local container on IBM Power Systems – it runs completely transparently, and it’s optimized to allow the dashDB engine to take advantage of the unique features of the Power Architecture.
If you want to move an existing dashDB Local environment from x86 to Power Systems, that’s easy too. The latest-generation POWER8 processors can operate in little-endian (LE) mode, which is the same byte order that x86 processors use. That means that you can move a dashDB container from one platform to the other without making any changes to your applications or data.
At a higher level, we have also ensured that running dashDB on Power Systems offers the same user experience as it does on x86, so the database and OS management, monitoring and integration aspects are exactly the same. The skills are completely transferable from one platform to another, so it’s a free choice and users don’t have to worry about being locked in.
Mona Patel: Can you tell us about the benefits that the Power Architecture provides for dashDB Local?
Mukta Singh: Well, for example, dashDB’s analytics engine is built on IBM BLU Acceleration – a columnar, in-memory technology that cuts query run-times from hours or minutes to just seconds.
BLU Acceleration is designed to take advantage of multi-threaded cores, and Power processors have more threads per core than most current x86 processors. In fact, if you compare an IBM POWER8 processor to an Intel Broadwell EX, it has four times as many threads per core. That means if you have a query that BLU can parallelize, you will get much better performance from Power Systems.
Similarly, because dashDB’s BLU Acceleration does all the processing in-memory, the bandwidth between the processor and the memory is very important. Again, Power Systems has a huge advantage here, with four times as much memory bandwidth as the x86 equivalent.
Finally, the processor’s cache size is important. BLU is engineered to do the majority of its processing in the CPU cache. That means it doesn’t need to repeatedly access data from RAM, which is usually a much slower process. Power processors offer four times as much cache than x86, which means they offer lower latency and reduce the need to access RAM even further. So they play to the strengths of dashDB’s query engine.
Mona Patel: So how do those numbers translate in terms of performance and cost-efficiency?
Mukta Singh: We’ve done a benchmark with dashDB Local of a 24-core POWER8 server versus a 44-core x86 server.
The Power server was 1.2 times faster in terms of throughput, despite having 45 percent fewer cores. Or to look at it another way, each POWER8 core offered 2.2 times more throughput than the x86 equivalent. Leadership performance and competitive pricing for Power scale-out servers deliver a very compelling price-performance-optimized solution with dashDB Local.
Mona Patel: How do you see the market for dashDB Local on Power Systems? Is this something that customers have been asking for?
Mukta Singh: Even when we started bringing dashDB Local to market last year, there were Power clients who were interested. As I mentioned earlier, Power has a dominant share of the Unix market, and there are thousands of companies whose businesses are built on DB2 or Oracle databases running on Power Systems. For companies that rely on Power Systems already, the idea of running dashDB Local on their existing infrastructure is very attractive.
But the results of our benchmark suggests that this isn’t just a good idea for existing Power clients – it’s also an opportunity for new clients to start out running dashDB on a hardware platform that is tailor-made for high-performance analytics.
And for any client who currently runs dashDB on x86 servers, the message we’d like to get across is that it’s easy to move to Power Systems. It’s faster, it’s more cost-effective, and you still get all the ease of use and ease of management that you’re used to with your existing dashDB environment.
Mona Patel: OK, last question: where can our readers go to learn more about dashDB Local on Power? Can they try out dashDB Local on Power Systems before they buy?
Mukta Singh: Yes, we offer a free trial with a Docker ID – please visit dashDB.com to learn more and access the trial.
Mona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing. With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics. Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.