Turn Up The Power For Software-Defined Data Warehousing

by Mona Patel

Interview with Mukta Singh

As big data analytics technologies such as Spark and Hadoop continue their move into the mainstream, you might think that the traditional data warehouse is becoming less important.

Actually, nothing could be further from the truth.

To enable data of all types to be ingested, transformed, processed and analyzed efficiently, many companies are choosing to build hybrid analytics architectures that plug cloud and open source technologies such as Spark and Hadoop into on-premises environments. At the heart of these hybrid architectures lies the data warehouse – a highly reliable resource that provides a single source of truth for enterprise reporting and analytics.

This raises an important question: since the data warehouse is so central to the hybrid analytics architecture, how can we make sure it performs well and cost-effectively?

Traditional wisdom is that the infrastructure doesn’t matter – that running these vital systems of record on commodity hardware is perfectly adequate. But when you look at the numbers, you may begin to question that view.

To understand why the right hardware – in this case, IBM Power Systems – can make a real difference, I spoke with Mukta Singh, Director of Data Warehousing at IBM. In my conversation with Mukta, we take a deeper dive into why IBM’s software-defined data warehouse – IBM dashDB Local – on IBM Power Systems offers a better price/performance ratio compared to commodity hardware.

Mona_Blog

Mona Patel: Can you tell our readers a little bit about the Power Architecture? What is so unique about it?

Mukta Singh: IBM Power Systems is the dominant server platform in today’s Unix market, with over 50 percent market share. It has also become a leading platform for Linux systems, and we have seen tremendous growth in that area in recent years.

Unlike commodity servers, which typically use x86 processors, Power servers use IBM’s Power Architecture, a unique processor architecture that has been designed specifically for big data and analytics workloads.

Mona Patel: How does IBM dashDB Local integrate with Power Systems?

Mukta Singh: dashDB Local is a software-defined data warehouse offering that has been optimized for rapid deployment and ease of management. Essentially, the system runs in a Docker container, which means it can be flexibly deployed on different types of hardware either on-premises or in a private or public cloud environment.

One of the options today is to deploy your dashDB Local container on IBM Power Systems – it runs completely transparently, and it’s optimized to allow the dashDB engine to take advantage of the unique features of the Power Architecture.

If you want to move an existing dashDB Local environment from x86 to Power Systems, that’s easy too. The latest-generation POWER8 processors can operate in little-endian (LE) mode, which is the same byte order that x86 processors use. That means that you can move a dashDB container from one platform to the other without making any changes to your applications or data.

At a higher level, we have also ensured that running dashDB on Power Systems offers the same user experience as it does on x86, so the database and OS management, monitoring and integration aspects are exactly the same. The skills are completely transferable from one platform to another, so it’s a free choice and users don’t have to worry about being locked in.

Mona Patel: Can you tell us about the benefits that the Power Architecture provides for dashDB Local?

Mukta Singh: Well, for example, dashDB’s analytics engine is built on IBM BLU Acceleration – a columnar, in-memory technology that cuts query run-times from hours or minutes to just seconds.

BLU Acceleration is designed to take advantage of multi-threaded cores, and Power processors have more threads per core than most current x86 processors. In fact, if you compare an IBM POWER8 processor to an Intel Broadwell EX, it has four times as many threads per core. That means if you have a query that BLU can parallelize, you will get much better performance from Power Systems.

Similarly, because dashDB’s BLU Acceleration does all the processing in-memory, the bandwidth between the processor and the memory is very important. Again, Power Systems has a huge advantage here, with four times as much memory bandwidth as the x86 equivalent.

Finally, the processor’s cache size is important. BLU is engineered to do the majority of its processing in the CPU cache. That means it doesn’t need to repeatedly access data from RAM, which is usually a much slower process. Power processors offer four times as much cache than x86, which means they offer lower latency and reduce the need to access RAM even further. So they play to the strengths of dashDB’s query engine.

Mona Patel: So how do those numbers translate in terms of performance and cost-efficiency?

Mukta Singh: We’ve done a benchmark with dashDB Local of a 24-core POWER8 server versus a 44-core x86 server.

The Power server was 1.2 times faster in terms of throughput, despite having 45 percent fewer cores. Or to look at it another way, each POWER8 core offered 2.2 times more throughput than the x86 equivalent. Leadership performance and competitive pricing for Power scale-out servers deliver a very compelling price-performance-optimized solution with dashDB Local.

Mona Patel: How do you see the market for dashDB Local on Power Systems? Is this something that customers have been asking for?

Mukta Singh: Even when we started bringing dashDB Local to market last year, there were Power clients who were interested. As I mentioned earlier, Power has a dominant share of the Unix market, and there are thousands of companies whose businesses are built on DB2 or Oracle databases running on Power Systems. For companies that rely on Power Systems already, the idea of running dashDB Local on their existing infrastructure is very attractive.

But the results of our benchmark suggests that this isn’t just a good idea for existing Power clients – it’s also an opportunity for new clients to start out running dashDB on a hardware platform that is tailor-made for high-performance analytics.

And for any client who currently runs dashDB on x86 servers, the message we’d like to get across is that it’s easy to move to Power Systems. It’s faster, it’s more cost-effective, and you still get all the ease of use and ease of management that you’re used to with your existing dashDB environment.

Mona Patel: OK, last question: where can our readers go to learn more about dashDB Local on Power? Can they try out dashDB Local on Power Systems before they buy?

Mukta Singh: Yes, we offer a free trial with a Docker ID – please visit dashDB.com to learn more and access the trial.

About Mona,

mona_headshotMona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

One Cloud Data Warehouse, Three Ways

by Mona Patel

There’s something very satisfying about using a single, cloud database solution to solve many business problems.  This is exactly what BPM Northwest experiences with IBM dashDB when delivering Data and Analytics solutions to clients worldwide.

The exciting success with dashDB compelled BPM Northwest to share implementations and best practices with IDC.

In the webcast they team up to discuss the value and realities of moving analytical workloads to the cloud.   Challenges around governance, data integration, and skills are also discussed as organizations are very interested and driven to seize the opportunities of a cloud data warehouse.

In the webcast, you will hear three ways that you can utilize IBM dashDB:

  • New applications, with some integration with on-premises systems
  • Self-service, business-driven sandbox
  • Migrating existing data warehouse workloads

After watching the webcast, please think about how IBM dashDB use cases discussed can apply to your challenges and if a hybrid data warehouse is the right solution for you.

Want to give IBM dashDB on Bluemix a try?  Before you sign up for a free trial, take a tutorial tour on the IBM dashDB YouTube channel to learn how to load data from your desktop, enterprise, and internet data sources, and then see how to run simple to complex SQL queries with your favorite BI tool, or integrated R/R Studio. In fact, watch how IBM dashDB integrates with other value added Bluemix services such as Dataworks Lift and Watson Analytics so that you can bring together all relevant data sources for newer insights.

mona_blog

About Mona,

mona_headshotMona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of experince analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

Enterprise Data Warehouse Beyond SQL with Apache Spark

By Torsten Steinbach, Lead Architect for IBM Data Warehousing Advanced Analytics

Enterprise IT infrastructure is often based heavily on relational data warehouses, where all other applications communicate through the data warehouse for analytics. There is pressure from line of business departments to use open source analytics & big data technology, such as R, Python and Spark for analytical projects and to deploy them continuously without having to wait for IT provisioning. Not being able to serve these requests can lead to proliferation of analytic silos and lost control of data. For this reason, the new IBM dashDB Local for software-defined environments (SDEs) and private clouds has now integrated a complete Apache Spark stack, enabling you to continue to operate established data warehouses and leverage its proven operational quality of service as well as running Spark-based workloads out of the box on the same data.

This tightly embedded Apache Spark environment can use the entire set of resources of the dashDB system, which also applies to the MPP scale out. Each dashDB Local node, each with its own data partition, is overlaid with a local Apache Spark executor process. The existing data partitions of the dashDB cluster are implicitly derived for the data frames in Spark and thus for any distributed parallel processing in Spark on this data.

ts_screenshotone

 

 

 

 

 

 

 

The co-location of the Spark execution capabilities with the database engine minimizes latency in accessing the data and leverages optimized local IPC mechanisms for data transfer. The benefits of this architecture become apparent when we apply standard machine learning algorithms on Spark to data in dashDB Local. Comparing a remote Spark cluster setup with a co-located setup we found that those algorithms can get a significant increase in speed. This even includes optimization in remote access to read data in parallel tasks, one for each database partition in dashDB.So you can see that there is indeed a performance advantage provided by the integrated architecture.

In addition, the Spark-enabled data warehouse engine can do a lot of things out of the box that were not possible before:

1. Out of the box data exploration & visualization

ts_screenshotone

2. Interactive Machine Learning

ts_screenshottwo

3. One-click deployment – Turning interactive notebooks into deployed Spark applications

ts_screenshotthree

ts_screenshotfour

4. dashDB as hosting environment to run your Spark applications

Once as Spark application has been deployed to dashDB it can be invoked in three different ways.

Using spark-submit.sh from command line and scripts:

ts_screenshotfive

ts_screenshotsix

Using dashDB REST API:

ts_screenshotseven

Using SPARK_SUBMIT stored procedure:

ts_screenshoteight

5. Out of the box machine learning

ts_screenshotnine

In addition, this implementation of Spark capabilities in dashDB Local provides you with a high degree of flexibility in ELT and ETL activities and let you process and land data in motion in dashDB Local.

Let’s summarize the key benefits that dashDB with integrated Apache Spark provides:

  1. dashDB Local lets you dramatically modernize your data warehouse solutions with advanced analytics based on Spark.
  2. Spark applications processing relational data gain significant performance and operational QoS benefits from being deployed and running inside dashDB Local.
  3. dashDB Local enables analytic solution creation end-to-end, from interactive exploration and machine learning experiments, verification of analytic flows, easy operationalization by creating deployed Spark applications, up to hosting Spark applications in a multi-tenant enterprise warehouse system and integrating them with other applications via various invocation APIs.
  4. dashDB Local allows you to invoke Spark logic via SQL connections.
  5. dashDB Local can land streaming data directly into tables via deployed Spark applications.
  6. dashDB Local can run complex data transformations and feature extractions that cannot be expressed with SQL using integrated Spark.

Please also check out the tutorial playlist for dashDB with Spark here: ibm.biz/BdrLNG.  You can also download a free trial version of dashDB Local at ibm.biz/dashDBLocal to see these Spark features in action for yourself.

 

About Torsten,

torsten-steinbachTorsten has worked over many years as an IBM software architect for IBM’s database software offerings with particular focus on performance monitoring, application integration and workload management. Today, Torsten is the lead architect for advanced analytics in IBM’s data warehouse products and cloud services.

Why Are Customers Architecting Hybrid Data Warehouses?

By Mona Patel

As a leader in IT, you may be  incented or mandated to explore cloud and big data solutions to transform rigid data warehousing environments into agile ones to match how the business really wants to operate.  The following questions must come to mind:

  • How do I integrate new analytic capabilities and data sets to my current on-premises data warehouse environment?
  • How do I deliver self service solutions to accelerate the analytic process?
  • How do I leverage commodity hardware to lower costs?

For these questions, and more, organizations are architecting hybrid data warehouses.  In fact, these organizations moving towards hybrid are referred to as ‘Best In Class’ according to The Aberdeen Group’s latest research: “Best In Class focus on hybridity, both in their data infrastructure and with their analytical tools as well.  Given the substantial investments companies have made in their IT environment, a hybrid approach allows them to utilize these investments to the best of their ability while explore more flexible and scalable cloud-based solutions as well.”  To hear more about these ‘Best In Class’ organizations, watch the 45 minute webcast.

How do you get to this hybrid data warehouse architecture with the least risk and most reward?  IBM dashDB delivers the most flexible, cloud database services to extend and integrate with your current analytics and data warehouse environment, addressing all the challenges related to leveraging new sources of customer, product, and operational insights to build new applications, products, and business models.

To help our clients evaluate hybrid data warehouse solutions, Harvard Research Group (HRG) provides an assessment of IBM dashDB.  In this paper, HRG highlights product functionality, as well as 3 uses cases in Healthcare, Oil and Gas, and Financial Services.   Security, Performance, High Availability, In-Database Analytics, and more are covered in the paper to ensure future architecture enhancements optimize IT rather than adding new skills, complexities, and integration costs. After reading this paper, you will find that dashDB enables IT to respond rapidly to the needs of the business, keep systems running smoothly, and achieve faster ROI.

To know more on dashDB check out the video below:

 

About Mona,

mona_headshotMona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

Start Small and Move Fast: The Hybrid Data Warehouse

by Mona Patel

In the world of cutting edge big data analytics, the same obstacles in gaining meaningful insight still exists – ease of getting data in and getting data out.  To address these long standing issues, the utmost flexibility is needed, especially when layered with the agile needs of the business.

Why spend millions of dollars replacing your data and analytics environment with the latest technology promise to address these issues, when can you to leverage existing investments, resources, and skills to achieve the same, and sometimes better, insight?

Consider a hybrid data warehouse.  This approach allows you to start small and move fast. It provides the best of both worlds – flexibility and agility without breaking the bank.  You can RAPIDLY serve up quality data managed by your data warehouse, blended with newer data sources and data types in the cloud, and apply integrated analytics such as Spark or R – all without additional IT resources and expertise.  How is this possible?  IBM dashDB.

Read Aberdeen’s latest report on The Hybrid Data Warehouse.

mona's blog

 

Watch Aberdeen Group’s Webcast on The Hybrid Data Warehouse.

Let me give you an example.  We live in a digital world, with organizations now very interested in improving customer data capture across mobile, web, IoT, social media, and more for newer insights.  A telecommunications client was facing heavy competition and wanted to quickly deliver unique mobile services for an upcoming event in order to acquire new customers by collecting and analyzing mobile and social media data.  Taking a hybrid data warehouse approach, the client was able to start small and move fast, uncovering new mobile service options.

Customer information generated from these newer data sources were blended together with existing customer data managed in the data warehouse to deliver newer insights.  IBM dashDB provided a high performing, public cloud data warehouse service that was up and running in minutes.  Automatic transformation of unstructured geospatial data into structured data, in-memory columnar processing, in-database geospatial analytics, integration with Tableau, and pricing were some of the key reasons IBM dashDB was chosen.

This brings me back to my first point – you don’t have to spend millions of dollars to capitalize on getting data in and getting data out.  For example, clients like the one described above took advantage of Cloudant JSON document store integration, enabling them to rapidly get data into IBM dashDB with ease– no ETL processing required.  Automatic schema discovery loads and replicates unstructured JSON documents that capture IoT, Web and mobile-based data into a structured format.  Getting data or information out was simple, as IBM dashDB provides in-database analytics and the use of familiar, integrated SQL based tools such as Cognos, Watson Analytics, Tableau, and Microstrategy.  I can only conclude that IBM dashDB is a great example of how a highly compatible cloud database can extend or modernize your on-premises data warehouse into a hybrid one to meet time-sensitive business initiatives.

What exactly is a hybrid data warehouse?  A hybrid data warehouse introduces technologies that extend the traditional data warehouse to provide key functionality required to meet new combinations of data, analytics and location, while addressing the following IT challenges:

  • Deliver new analytic services and data sets to meet time-sensitive business initiatives
  • Manage escalating costs due to massive growth in new data sources, analytic capabilities, and users
  • Achieve data warehouse elasticity and agility for ALL business data

mona_dashDB

Still not convinced on the power of a hybrid data warehouse?  Hear what Aberdeen Group’s expert Michael Lock has to say in this 30 min webcast.

About Mona,

mona_headshot

Mona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

IBM dashDB Local preview to be featured at Cloud Expo East 2016 in New York

by Cindy Russell, IBM Data Warehouse Marketing

With an expected attendance of over 6,000 from June 7th through 9th, Cloud Expo New York is the one and only original event in which technology vendors can meet to experience and discuss the entire world of the Cloud.

On June 7, 2016, we are thrilled to be rolling out IBM dashDB Local™ as an open preview.  This exciting new IBM hybrid data warehouse offering is part of the dashDB family of data management solutions. dashDB Local delivers dashDB technology via container technology for implementations such as private and virtual private clouds. It is an ideal solution when you want cloud-like simplicity, yet need more control over applications and data. Participants in this preview program have been very enthusiastic about this technology, and you can read more in Mitesh Shah’s blog.

dashDB Local is:

·        Open: Leverage the power of an open warehouse platform

·        Flexible & Hybrid: Easily deploy the right workload to the right platform

·        Fast: Quickly realize business outcomes with advanced processing technologies

·        Simple: Automated management features help to lower the analytics cost model

 

dashDB Local will also be featured in this speaking session at Cloud Expo!

Building Your Hybrid Data Warehouse Solution with dashDB

Matthias Funke, Hybrid Data Warehouse Offering and Strategy Lead at IBM

June 7, 2016 from 11:40 AM – 12:15 PM

 

See the dashDB family in action at IBM Booth #201, the SoftLayer Booth!

cloud expo booth

Register to attend

IBM Booth #201

June 7, 2016

Cloud Expo East 2016

Javits Center, New York, NY

Note: First time attendees will need to create a new account.

As you respond to increasing requests for new analytics, you need fast and flexible technology. Learn how dash DB’s cloud-based managed service and new container-based edition gives you the speed and flexibility that you need.

Want to get started now with dashDB Local? Learn about and download our preview here: ibm.biz/dashDBLocal

Using Docker containers for software-defined environments or private cloud implementations

by Mitesh Shah

Data warehousing architectures have evolved considerably over recent years. As businesses try to derive insight as the basis of value creation, ALL roles must participate by leveraging new insights.  As a result, analytics needs are expanding, markets are transforming and new business models are being created.  This ushers in increased requirements for self-service analytics and alternative infrastructure solutions. Read on to learn how the “software-defined environment” (SDE) that utilizes container technology can help you meet expanded analytics needs.

Adaptability delivered through software-defined environments

From an avalanche of new data, to mobile computing and cloud-based platforms, new technologies must move into the IT infrastructure very quickly. Traditional IT systems—hampered by labor-intensive management and high costs—are struggling to keep up. IT organizations are caught between complex security requirements, extreme data volumes and the need for rapid deployment of new services. A simpler, more adaptive and more responsive IT infrastructure is required.

One of the key solutions on the horizon is  the SDE which optimizes the entire computing infrastructure – compute, storage and network resources – so that IT staff can adapt to different types of workloads very quickly. For example, without an SDE, resources are assigned manually to workloads; the same assignments happens automatically within an SDE.

Now, dashDB Local  (via Docker container) is available as an early access client preview.  I hope you will test this new technology and provide us valuable feedback. Learn more, then request access: ibm.biz/dashDBLocal

By dynamically assigning workloads to IT resources based on a variety of factors, including the characteristics of specific applications, the best-available resources, and service-level policies, a software-defined environment can deliver continuous, dynamic optimization and reconfiguration to address infrastructure issues.

Software-defined environment benefits

A software defined environment framework can help to:

  • Simplify operations with automated infrastructure tuning and configuration
  • Reduce time to value with a simple, pluggable and rich API-supported architectures
  • Sense and respond to workload demands automatically
  • Optimize resources by assigning assets without manual intervention
  • Maintain security and manage privacy through a common platform
  • Facilitate better business outcomes through advanced analytics and cognitive capabilities

A software-defined environment fits well into the private cloud ecosystem so that IT staff can deliver flexibility and ease of consumption, as well as maximize the use of commodity or virtualized hardware. An SDE is now easily achievable by leveraging container technology, where Docker is one of the leaders.

Docker containers provide application portability

Docker containers “wrap up” a piece of software in a complete file system that contains everything the software needs to run: code, run-times, system tools, system libraries and other components that can be installed on a server. This guarantees that the software will always run the same, regardless of the environment in which it is running.

Docker provides true application portability and ease of consumption by alleviating the complex process of software setup and installation that often can require multiple skills across multiple hours or days. It provides OS-level abstraction without disrupting the standards on the host operating system, which makes it even more attractive.

One key point to keep in mind is that Docker is not the same as VMware. Docker provides process isolation at the operating system level, whereas VMware provides a hardware abstraction layer. Unlike VMware, Docker does not create an entire virtual operating system. Instead, the host operating system kernel can be shared across multiple Docker containers. This makes it very lightweight to deploy and faster to start than a virtual machine.  There is no looking back, as container technology is being very quickly embraced as part of a hybrid solution that meets business user needs-fast!

dashDB Local: data warehousing delivered via Docker container

Coming full circle, the data warehouse is the foundation of all analytics and must be fast and agile to serve new analytics needs.  Software defined environments make this easy to do – enabling key deployment of the warehousing engine in minutes as compared to hours or days.

IBM dashDB is the data warehousing technology that delivers high speed insights through in-memory computing and  in-database analytics at massively parallel processing (MPP) scale.  It has been available as a fully managed services on the IBM cloud.  Now, dashDB Local  as a is available as an early access client preview for private clouds and other software-defined infrastructures.  I hope you will test this new technology and provide us valuable feedback. Learn more, then request access: ibm.biz/dashDBLocal

About Mitesh,

MiteshMitesh Shah is the product manager for the new dashDB data warehousing solution as a software-defined environment (SDE) that can be used on private clouds and other implementations that support Docker container technology. He has broad experience around various facets of software development revolving around database and data warehousing technologies.  Throughout his career, Mitesh has enjoyed a focus on helping clients address their data management and solution architecture needs.