Turn Up The Power For Software-Defined Data Warehousing

by Mona Patel

Interview with Mukta Singh

As big data analytics technologies such as Spark and Hadoop continue their move into the mainstream, you might think that the traditional data warehouse is becoming less important.

Actually, nothing could be further from the truth.

To enable data of all types to be ingested, transformed, processed and analyzed efficiently, many companies are choosing to build hybrid analytics architectures that plug cloud and open source technologies such as Spark and Hadoop into on-premises environments. At the heart of these hybrid architectures lies the data warehouse – a highly reliable resource that provides a single source of truth for enterprise reporting and analytics.

This raises an important question: since the data warehouse is so central to the hybrid analytics architecture, how can we make sure it performs well and cost-effectively?

Traditional wisdom is that the infrastructure doesn’t matter – that running these vital systems of record on commodity hardware is perfectly adequate. But when you look at the numbers, you may begin to question that view.

To understand why the right hardware – in this case, IBM Power Systems – can make a real difference, I spoke with Mukta Singh, Director of Data Warehousing at IBM. In my conversation with Mukta, we take a deeper dive into why IBM’s software-defined data warehouse – IBM dashDB Local – on IBM Power Systems offers a better price/performance ratio compared to commodity hardware.

Mona_Blog

Mona Patel: Can you tell our readers a little bit about the Power Architecture? What is so unique about it?

Mukta Singh: IBM Power Systems is the dominant server platform in today’s Unix market, with over 50 percent market share. It has also become a leading platform for Linux systems, and we have seen tremendous growth in that area in recent years.

Unlike commodity servers, which typically use x86 processors, Power servers use IBM’s Power Architecture, a unique processor architecture that has been designed specifically for big data and analytics workloads.

Mona Patel: How does IBM dashDB Local integrate with Power Systems?

Mukta Singh: dashDB Local is a software-defined data warehouse offering that has been optimized for rapid deployment and ease of management. Essentially, the system runs in a Docker container, which means it can be flexibly deployed on different types of hardware either on-premises or in a private or public cloud environment.

One of the options today is to deploy your dashDB Local container on IBM Power Systems – it runs completely transparently, and it’s optimized to allow the dashDB engine to take advantage of the unique features of the Power Architecture.

If you want to move an existing dashDB Local environment from x86 to Power Systems, that’s easy too. The latest-generation POWER8 processors can operate in little-endian (LE) mode, which is the same byte order that x86 processors use. That means that you can move a dashDB container from one platform to the other without making any changes to your applications or data.

At a higher level, we have also ensured that running dashDB on Power Systems offers the same user experience as it does on x86, so the database and OS management, monitoring and integration aspects are exactly the same. The skills are completely transferable from one platform to another, so it’s a free choice and users don’t have to worry about being locked in.

Mona Patel: Can you tell us about the benefits that the Power Architecture provides for dashDB Local?

Mukta Singh: Well, for example, dashDB’s analytics engine is built on IBM BLU Acceleration – a columnar, in-memory technology that cuts query run-times from hours or minutes to just seconds.

BLU Acceleration is designed to take advantage of multi-threaded cores, and Power processors have more threads per core than most current x86 processors. In fact, if you compare an IBM POWER8 processor to an Intel Broadwell EX, it has four times as many threads per core. That means if you have a query that BLU can parallelize, you will get much better performance from Power Systems.

Similarly, because dashDB’s BLU Acceleration does all the processing in-memory, the bandwidth between the processor and the memory is very important. Again, Power Systems has a huge advantage here, with four times as much memory bandwidth as the x86 equivalent.

Finally, the processor’s cache size is important. BLU is engineered to do the majority of its processing in the CPU cache. That means it doesn’t need to repeatedly access data from RAM, which is usually a much slower process. Power processors offer four times as much cache than x86, which means they offer lower latency and reduce the need to access RAM even further. So they play to the strengths of dashDB’s query engine.

Mona Patel: So how do those numbers translate in terms of performance and cost-efficiency?

Mukta Singh: We’ve done a benchmark with dashDB Local of a 24-core POWER8 server versus a 44-core x86 server.

The Power server was 1.2 times faster in terms of throughput, despite having 45 percent fewer cores. Or to look at it another way, each POWER8 core offered 2.2 times more throughput than the x86 equivalent. Leadership performance and competitive pricing for Power scale-out servers deliver a very compelling price-performance-optimized solution with dashDB Local.

Mona Patel: How do you see the market for dashDB Local on Power Systems? Is this something that customers have been asking for?

Mukta Singh: Even when we started bringing dashDB Local to market last year, there were Power clients who were interested. As I mentioned earlier, Power has a dominant share of the Unix market, and there are thousands of companies whose businesses are built on DB2 or Oracle databases running on Power Systems. For companies that rely on Power Systems already, the idea of running dashDB Local on their existing infrastructure is very attractive.

But the results of our benchmark suggests that this isn’t just a good idea for existing Power clients – it’s also an opportunity for new clients to start out running dashDB on a hardware platform that is tailor-made for high-performance analytics.

And for any client who currently runs dashDB on x86 servers, the message we’d like to get across is that it’s easy to move to Power Systems. It’s faster, it’s more cost-effective, and you still get all the ease of use and ease of management that you’re used to with your existing dashDB environment.

Mona Patel: OK, last question: where can our readers go to learn more about dashDB Local on Power? Can they try out dashDB Local on Power Systems before they buy?

Mukta Singh: Yes, we offer a free trial with a Docker ID – please visit dashDB.com to learn more and access the trial.

About Mona,

mona_headshotMona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

Increased Speed, More Options for dashDB for Analytics with Pay-As-You-Go and Bluemix Lift

by Ben Hudson

Harnessing the power of IBM dashDB for Analytics just got quicker and easier. We’re excited to introduce two new and improved ways to connect to the cloud for in-memory processing; RStudio and Cloudant integrations; in-database analytics; and other powerful features that will reduce your time to market:

  1. Pay-As-You-Go (PayGo) provisioning: Starting today, you can purchase dashDB for Analytics directly in Bluemix using your credit card*.  We’ll start provisioning your system right away, accelerating your time to value.
  2. Bluemix Lift: Now you can move your on-premises data stores into a dashDB instance even faster. Bluemix Lift, IBM’s newest data movement solution, accelerates data migration by up to 10 times versus traditional options, with the flexibility of both PayGo and subscription plans to meet your data needs.  Check the details out here.

You can also purchase dashDB for Analytics through a Bluemix subscription.  Try it out today!

About Ben,

ben-hudsonBen Hudson is an Advisory Offering Manager for IBM dashDB for Analytics. He recently obtained his Master’s degree in Computer Science from Wesleyan University in Middletown, CT.

 

*Note: dashDB for Analytics MPP Small for AWS is not available as a PayGo plan.

 

One Cloud Data Warehouse, Three Ways

by Mona Patel

There’s something very satisfying about using a single, cloud database solution to solve many business problems.  This is exactly what BPM Northwest experiences with IBM dashDB when delivering Data and Analytics solutions to clients worldwide.

The exciting success with dashDB compelled BPM Northwest to share implementations and best practices with IDC.

In the webcast they team up to discuss the value and realities of moving analytical workloads to the cloud.   Challenges around governance, data integration, and skills are also discussed as organizations are very interested and driven to seize the opportunities of a cloud data warehouse.

In the webcast, you will hear three ways that you can utilize IBM dashDB:

  • New applications, with some integration with on-premises systems
  • Self-service, business-driven sandbox
  • Migrating existing data warehouse workloads

After watching the webcast, please think about how IBM dashDB use cases discussed can apply to your challenges and if a hybrid data warehouse is the right solution for you.

Want to give IBM dashDB on Bluemix a try?  Before you sign up for a free trial, take a tutorial tour on the IBM dashDB YouTube channel to learn how to load data from your desktop, enterprise, and internet data sources, and then see how to run simple to complex SQL queries with your favorite BI tool, or integrated R/R Studio. In fact, watch how IBM dashDB integrates with other value added Bluemix services such as Dataworks Lift and Watson Analytics so that you can bring together all relevant data sources for newer insights.

mona_blog

About Mona,

mona_headshotMona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of experince analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

IBM dashDB Local FAQ

When it comes to next-generation data warehousing and data management, the IBM dashDB family offers a range of options that all share a common technology.  dashDB Local is one of the new offerings and this blog provides a series of answers to your questions!  Please feel free to jump in at the bottom in the comments to add your own questions and we will respond.

      1.  What is the dashDB family?
        IBM dashDB is a family of next generation of database and data warehouse technologies that help you respond very quickly to application needs.  Originally in data warehousing, IT professionals assembled hardware software and storage to handle their large data sets for analytics needs.  This was risky, costly and time consuming.  This gave way to the data warehouse appliance that provided an optimized system for data warehousing and analytics.  The appliance was so successful that many consider it to be the backbone of their analytics architecture.But the world of analytics is expanding and new technologies are needed to handle more requests, more data sources and even self-service needs.  Hybrid data architectures are coming to the forefront to handle these increased needs. The dashDB family plays a key role here:

        • dashDB for analytics – as a fully managed cloud data warehouse
        • dashDB Local – as a configured data warehouse delivered via container technology to enable flexible deployment
        • dashDB for transactions – as a fully managed database as a service for transactional workloads.

        This family is designed to help you respond to new needs very quickly.  It also shares a common engine to help you leverage the same skills across different deployment models and application types. For more information on dashDB, visit dashDB.com.

      2. What is dashDB Local?
        dashDB Local is in-memory, columnar data warehousing software, supporting wide range of analytic workloads—from datamarts to enterprise data warehouses. It is deployed using Docker container technology, supporting a software defined environment such as private cloud, virtual private cloud or infrastructure of your choice, thus enabling hybrid Cloud configuration. dashDB Local can be deployed in minutes—making it fast and easy to deliver an auto configured data warehouse with built-in Netezza and Oracle compatibility.
      3. What is Docker?
        Docker is container technology that simplifies packaging and distribution of the software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries – anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.
      4. What is the difference between Docker and VMware?
        VMWare is virtualization technology, while Docker is the container technology used for simplified packaging and distribution of software.   While Docker Container provides operating system-level process isolation, VMware virtualization lets you run multiple virtual machines (VMs) on a single physical server (thus providing H/W level abstraction). Unlike VMware, Docker does not create an entire virtual operating system (thus making it lightweight to deploy and faster to start up compared to VMs). Both technologies can be used together, for example, Docker containers can be created inside VMs to make a solution ultra-portable.
      5. Is dashDB Local generally available?
        Yes it is. You can find a free trial of it at ibm.biz/dashDBLocal.
      6. On which platforms is dashDB Local supported?
        Today, dashDB Local runs on any platform that Docker engine is supported such as Linux, Microsoft Windows, Apple Macintosh, and Cloud providers. More details can be found here and well as on the Docker site.
      7. On what platforms is dashDB Local supported?
        dashDB Local software is packaged and deployed using Docker container technology. Thus, dashDB Local can be installed on any platform where Docker engine client is supported. This includes Windows, Macintosh and variety of Linux platforms. Deploying IBM dashDB Local on Windows or Macintosh requires a Linux VM, in which you run Docker. By downloading the Docker Toolbox, which includes a GUI and a VM, you can accomplish this easily. Please refer to the dashDB Local knowledge center (documentation) for further details.
      8.  Is dashDB Local available on IBM Bluemix Local?
        Bluemix Local is managed by IBM on a clients’ infrastructure. There are no plans to offer dashdB Local on Bluemix Local at this time.
      9.  How long does it take to install dashDB Local?
        dashDB Local is based on Docker container technology, which allows setup and installation in less than 30 minutes. Today, the SMP version or MPP version of dashDB Local can be installed in less than 15 minutes.  This can be done on a range of servers from a simple laptop to a production-grade server. Since various components, such as LDAP security and DSM monitoring, are already bundled into a single container installation, there is a tremendous time savings and a more streamlined and efficient process.
      10. What  components are packaged and installed with  dashDB Local?
        dashDB Local is comprised of the following software components packaged in the Docker container, thus simplifying deployment and speeding up the overall setup. At the core is the dashDB analytics engine, tuned for columnar and in-memory workloads, built-in Netezza and Oracle compatibility, an LDAP server for user access management.  IBM Data Server Manager, which acts as the key monitoring component, is also included to provide key features such as query history monitoring, database performance monitoring and OS level monitoring.
      11. Is dashDB local available outside of Docker public hub?
        Soon, a standalone version of dashDB Local will be available on IBM Fix Central and will  leverage the Docker download command on the host OS. This will remove the dependency of pulling the dashDB Local image from the private, access-controlled repository on the public Docker hub. This will also alleviate challenges where a firewall port to access the public docker hub cannot be opened.
      12. What are the minimum prerequisites to install dashDB Local?
        You can find the documented prerequisites for dashDB Local in the Knowledge Center.  Some of the key requirements focus on Docker client and POSIX-compliant storage files systems, as documented. An additional key requirement revolves around opening access to the following network ports or on the firewall. Ensure that the following ports are opened and defined on all nodes in the cluster and defined in each node’s /etc/hosts file.60000-60024, for database FCM
        25000-25999, for Apache Spark
        50022, for SSH/container OS
        50001, for database connection with SSL
        50000, for database connection without SSL
        9929, for communication tests
        9300, for web console status
        8443, for web console HTTPS
        5000, for System Manager
        389, for LDAP
        22, for SSH/host OS
      13. How does Apache Spark fit into the dashDB Local architecture?
        dashDB Local lets you dramatically modernize your data warehouse solutions with advanced analytics based on Spark. It is installed and configured within the dashDB Local container, thus making it fully integrated, supporting a variety of use cases.Spark applications that process relational data can gain significant performance and operational QoS benefits from deploying and running inside dashDB Local. It enables end-to-end analytic solution creation, from interactive exploration and machine learning experiments; verification of analytic flows; easy operationalization of Spark applications through to hosting Spark applications in a multi-tenant enterprise warehouse system; and integration of Spark applications with other applications via various invocation APIs. It allows you to invoke Spark logic via SQL connections and can land streaming data directly into tables via deployed Spark applications. It can run complex data transformations and feature extractions that cannot be expressed with SQL using integrated Spark.
      14. I have a running dashdb local instance where I’ve set DISABLE_SPARK=‘YES’ in the options file. How can I tell if this option took affect and the actual memory the db is using?This can be confirmed during the startup of dashDB Local container. When Spark is disabled, you will see in the docker start output “Spark support is going to be disabled.” When Spark is enabled you will see”Current spark share : XX% of total memory. “
      15. Is the SPARK setting available as an install time switch, or can I enable and disable spark every time I start the database by changing this?
        The SPARK feature can be enabled/disabled any time. You can change the DISABLE_SPARK setting anytime again and just restart dashDB local in the container (docker exec -it dashdb stop/start).
      16. Is there currently a maximum number of nodes for dashDB Local?
        Currently, one can create up to 24 node MPP cluster in dashDB Local.
      17. Can I upgrade from the GA trial to the production version of dashDB Local?
        Yes, you can upgrade/update the post-GA trial version to a production grade version. Once the product is purchased, a permanent license can be applied to update the license.
      18. How can I get support for dashDB Local ?
        Upon purchase of dashDB Local, clients are entitled to support via email or phone. One can open a PMR with the IBMs support ticketing system. IBM will support the dashDB Local container and all of the components inside it. For Docker specific issues, clients should contact the Docker support team. The IBM support team will assist in identifying the problem and advise accordingly.
      19. If I run into docker issue, would IBM support handle it for me?
        IBM will support the “dashDB Local container”, but not the Docker engine itself. It is the client’s responsibility to subscribe to Docker support.  Customers can leverage Docker CS engine (from Docker) or Open source docker RPMs that come with the Linux distros (such as Red Hat). Docker will provide commercial support only for Docker CS engine and not for Docker RPMs and it is the customer choice of a relevant support path regarding docker components.
      20. How often are dashDB Local updates/fix packs made available?
        dashDB Local is based on an agile development cloud model and the intent is to roll out container updates frequently. This will not only make it easier to stay current with the latest bug fixes, and newest features. The updates are handled via container update process and will take less then 30 minutes, similar to the container setup.
      21. Can dashDB Local be installed on Amazon AWS or Azure/ AWS ?
        dashDB Local can be installed on the infrastructure of your choice, as long as that infrastructure supports Docker container technology. dashDB Local is client-managed and can also be installed in your data center and any virtual private cloud infrastructures such as Amazon AWS EC2 or Microsoft Azure platforms.
      22. What kind of storage is required for dashDB Local?
         dashDB Local requires a posix-compliant clustered file storage system. This is applicable for the MPP cluster only. For a standalone SMP installation, this is not a requirement. You can use standard local disks for a SMP dashDB local node setup.Cluster file system is a file system that is configured in a way to group servers and resources together to have concurrent access to a single file system. The key to a cluster file system is that the cluster appears as a single highly available system to all the end users. This increases the storage utilization rate and can result in high performance.
        Some common examples of clustered file storage system are :
        – VERITAS Cluster File System(VxFS) Sun Solaris, HP/UX
        – Generalized Parallel File System (GPFS) IBM AIX, Linux
        – GFS2 Red Hat only
      23. Is Oracle compatibility available in dashDB local?
        Yes, Oracle compatibility is supported in dashDB Local. You can enable applications that were written for an Oracle database to use dashDB™ Local without having to be rewritten. To use this capability, you must specify that dashDB Local is to run in Oracle compatibility mode prior to initial deployment.Before you begin, the /mnt/clusterfs/ directory must already be created. To perform this task, you need to have root authority on the host system OS. By default, Oracle compatibility mode is not enabled. To enable it, you explicitly make an entry in the /mnt/clusterfs/options file prior to deploying dashDB Local. Run the command below and then follow the normal steps around container deployment/initializationecho “ENABLE_ORACLE_COMPATIBILITY=’YES'” >> /mnt/clusterfs/options

        For more details:
        http://www.ibm.com/support/knowledgecenter/SS6NHC/com.ibm.swg.im.dashdb.doc/admin/local_oracompat.html
        http://www.ibm.com/support/knowledgecenter/SS6NHC/com.ibm.swg.im.dashdb.doc/admin/local_setup.html#setup

 

IBM dashDB Local opens its preview for data warehousing on private clouds and more!

by Mitesh Shah

Just like in the story of Goldilocks … you may be looking for modern data warehousing that is “just right.”  Your IT strategy may include cloud and you may like the simplicity and scalability benefits of cloud … yet some data and applications may need to stay on-premises for a variety of reasons.  Traditional data warehouses provide essential analytics, yet they may not be right for new types of analytics, data born on the cloud, or simply cannot contain a growing workload of new requests.

IBM dashDB Local is an open preview technology that is designed to give you “just right” cloud-like simplicity and flexibility.  It delivers a configured data warehouse in a Docker container that you can deploy wherever you need it as long Docker is supported on that infrastructure. Often, this is a private cloud, virtual private cloud (AWS/Azure), or other software-defined infrastructure. You gain management simplicity and have an environment that you can control more directly.

DownloadFromDocker
Download and install dashDB Local quickly and simply via Docker container technology.

dashDB Local may be the right choice when you have complex applications that must be readied for cloud, have SLAs or regulations that require data or applications to stay on premises, or you need to address new analytics requests very quickly with easy scale in and out capabilities.

dashDB Local complements the dashDB data warehouse as a service offering that is delivered via IBM Bluemix. Because both products are based on a common database technology, you can move workloads across these editions without costly and complex application change!   This is one example of how we define a hybrid data warehouse and how it can help improve your flexibility over time as your needs evolve.

Since dashDB Local began its closed preview in February of 2016, the team has rallied to bring in a comprehensive set of data warehousing features to this edition of dashDB. We have been listening to the encouraging feedback from our initial preview participants, and as a result, we now have a solution that is open for you to test!

So what are you waiting for?

It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes.

Ours early adopters have been fascinated by the power and ease of deployment for the Docker container.  It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes. One client said that dashDB Local is as easy to deploy and manage as a mobile app! We are thrilled by this type of feedback!

Workload monitoring in dashDB Local delivers elasticity to scale out or in.
Workload monitoring in dashDB Local delivers elasticity to scale out or in.

The open preview (v. 0.5.0) offers extreme scale out and scale in capabilities. Yes, you heard me right. Scale-in provides the elasticity to not tie up your valuable resources beyond the peak workloads. This maximizes return on investment for your variable reporting and analytics solutions.  The open preview will help you test drive the Netezza compatibility (IBM PureData System for Analytics) within dashDB technology, as well as analytics support using RStudio. Automated High Availability is another attractive feature that is provided out-of-the box for you to see and test.

Preview participants have been eager to test drive query performance. One participant says, “We are very impressed with the performance, and within no time we have grown our dataset of 40 million to 200 million records (a few TBs) and the analytics test queries run effortless.” Our participants are leveraging their data center infrastructure whether it’s bare metal or virtualized (VMs) to get started and some have installed it on their laptops to quickly gain an understanding of this preview.

Register for dashDB Local previewFind out how it can be “just right” for you!  Go here to give it a try and get ready to be wowed.  We value and need your feedback to help us prioritize features that are important to your business.  All the best and don’t hesitate to drop me a line to let me know what you think!


About Mitesh,

MiteshMitesh Shah is the product manager for the new dashDB Local data warehousing solution as a software-defined environment (SDE) that can be used on private clouds and platforms that support Docker container technology. He has broad experience around various facets of software development revolving around relational databases and data warehousing technologies.  Throughout his career, Mitesh has enjoyed a focus on helping clients address their data management and solution architecture needs.

Using Docker containers for software-defined environments or private cloud implementations

by Mitesh Shah

Data warehousing architectures have evolved considerably over recent years. As businesses try to derive insight as the basis of value creation, ALL roles must participate by leveraging new insights.  As a result, analytics needs are expanding, markets are transforming and new business models are being created.  This ushers in increased requirements for self-service analytics and alternative infrastructure solutions. Read on to learn how the “software-defined environment” (SDE) that utilizes container technology can help you meet expanded analytics needs.

Adaptability delivered through software-defined environments

From an avalanche of new data, to mobile computing and cloud-based platforms, new technologies must move into the IT infrastructure very quickly. Traditional IT systems—hampered by labor-intensive management and high costs—are struggling to keep up. IT organizations are caught between complex security requirements, extreme data volumes and the need for rapid deployment of new services. A simpler, more adaptive and more responsive IT infrastructure is required.

One of the key solutions on the horizon is  the SDE which optimizes the entire computing infrastructure – compute, storage and network resources – so that IT staff can adapt to different types of workloads very quickly. For example, without an SDE, resources are assigned manually to workloads; the same assignments happens automatically within an SDE.

Now, dashDB Local  (via Docker container) is available as an early access client preview.  I hope you will test this new technology and provide us valuable feedback. Learn more, then request access: ibm.biz/dashDBLocal

By dynamically assigning workloads to IT resources based on a variety of factors, including the characteristics of specific applications, the best-available resources, and service-level policies, a software-defined environment can deliver continuous, dynamic optimization and reconfiguration to address infrastructure issues.

Software-defined environment benefits

A software defined environment framework can help to:

  • Simplify operations with automated infrastructure tuning and configuration
  • Reduce time to value with a simple, pluggable and rich API-supported architectures
  • Sense and respond to workload demands automatically
  • Optimize resources by assigning assets without manual intervention
  • Maintain security and manage privacy through a common platform
  • Facilitate better business outcomes through advanced analytics and cognitive capabilities

A software-defined environment fits well into the private cloud ecosystem so that IT staff can deliver flexibility and ease of consumption, as well as maximize the use of commodity or virtualized hardware. An SDE is now easily achievable by leveraging container technology, where Docker is one of the leaders.

Docker containers provide application portability

Docker containers “wrap up” a piece of software in a complete file system that contains everything the software needs to run: code, run-times, system tools, system libraries and other components that can be installed on a server. This guarantees that the software will always run the same, regardless of the environment in which it is running.

Docker provides true application portability and ease of consumption by alleviating the complex process of software setup and installation that often can require multiple skills across multiple hours or days. It provides OS-level abstraction without disrupting the standards on the host operating system, which makes it even more attractive.

One key point to keep in mind is that Docker is not the same as VMware. Docker provides process isolation at the operating system level, whereas VMware provides a hardware abstraction layer. Unlike VMware, Docker does not create an entire virtual operating system. Instead, the host operating system kernel can be shared across multiple Docker containers. This makes it very lightweight to deploy and faster to start than a virtual machine.  There is no looking back, as container technology is being very quickly embraced as part of a hybrid solution that meets business user needs-fast!

dashDB Local: data warehousing delivered via Docker container

Coming full circle, the data warehouse is the foundation of all analytics and must be fast and agile to serve new analytics needs.  Software defined environments make this easy to do – enabling key deployment of the warehousing engine in minutes as compared to hours or days.

IBM dashDB is the data warehousing technology that delivers high speed insights through in-memory computing and  in-database analytics at massively parallel processing (MPP) scale.  It has been available as a fully managed services on the IBM cloud.  Now, dashDB Local  as a is available as an early access client preview for private clouds and other software-defined infrastructures.  I hope you will test this new technology and provide us valuable feedback. Learn more, then request access: ibm.biz/dashDBLocal

About Mitesh,

MiteshMitesh Shah is the product manager for the new dashDB data warehousing solution as a software-defined environment (SDE) that can be used on private clouds and other implementations that support Docker container technology. He has broad experience around various facets of software development revolving around database and data warehousing technologies.  Throughout his career, Mitesh has enjoyed a focus on helping clients address their data management and solution architecture needs.

How To Make Good Decisions in Deploying the Logical Data Warehouse

By Rich Hughes,

A recent article addresses the challenges facing businesses trying to improve their results by analyzing data. As Hadoop’s ability to process large data volumes continues to gain acceptance, Dwaine Snow provides a reasonable method to examine when and under what circumstances to deploy Hadoop alongside your PureData System for Analytics (PDA).   Snow makes the case that traditional data warehouses, like PDA, are not going away because of the continued value they provide. Additionally, Hadoop distributions also are playing a valuable role in meeting some of the challenges in this evolving data ecosystem.

The valuable synergy between Hadoop and PDA are illustrated conceptually as the logical data warehouse in Snow’s December 2014 paper (Link to Snow’s Paper).

The logical data warehouse diagrams the enterprise body of data stores, connective tissue like APIs, and the cognitive features like analytical functions.  The logical data warehouse documents the traditional data warehouse, which began about 1990, and its use of structured data bases.  Pushed by the widespread use of the Internet and its unstructured data exhaust, the Apache Hadoop community was founded as a means to store, evaluate, and make sense of unstructured data.  Hadoop thus imitated the traditional data warehouse in evaluating value from the data available, then retaining the most valuable data sources from that investigation.  As well, the discovery, analytics, and trusted data zone architecture of today’s logical data warehouse resembles the layered architecture of yesterday’s data warehouse.

Since its advent some 10 years ago, Hadoop has branched out to servicing SQL statements against structured data types, which brings us back to the business challenge:  where can we most effectively deploy our data assets and analytic capabilities?  In answering this question, Snow discusses the fit-for-purpose repositories which for success, require inter-operability across the various zones and data stores.  Each data zone is evaluated for cost, value gained, and required performance on service level agreements.

By looking at this problem as a manufacturing sequence, the raw material / data is first acquired, then manipulated into a higher valued product—in this case, the value being assessed by the business consumer based on insights gained and speed of delivery.  Hadoop distributed file environments shows its worth in storing relatively larger data volumes and accessing both structured and unstructured data.  Traditional data warehouses like IBM’s PureData System for Analytics display their value in being the system of record where advanced analytics are delivered in a timely fashion.

In an elegant cost benefit analysis, Snow provides the tools necessary to weigh where best to deploy the different, but complimentary data insight technologies.  A listing of Total Cost of Ownership (TCO) for Hadoop includes four line items:

  1. Initial system cost (hardware and software)
  2. Annual system maintenance cost
  3. Setup costs to get the system ‘up and running’
  4. Costs for humans managing the ongoing system administration

Looking at just the first cost item, which is sometimes reduced to a per Terabyte price like $1,000 per TB, is but part of the story.  The article documents the other unavoidable tasks for deploying and maintaining a Hadoop cluster.  Yes, $200,000 might be the price for the hardware and software for a 200TB system, but over a five year ownership, industry studies are cited in ascribing the other significant budget expenses.  Adding up the total costs, the conclusion is that the final amount could very well be in excess of $2,000,000.

The accurate TCO number is then subtracted from the business benefits of using the system, which determines net value gained.  And business benefits are accrued, Snow notes, from query activity.  Only 1% of the queries in today’s data analytic systems require all of the data, which makes that activity perfect for the lower cost and performance Hadoop model.  Conversely, 90% of current queries require only 20% of the data, which matches well with the characteristics of the PureData System for Analytics:  reliability with faster analytic performance.  What Snow has shown is the best-of-breed nature of the Logical Data Warehouse, and as the ancient slogan suggests, how to get more “bang for the buck”.

About Rich Hughes,

Rich Hughes is an IBM Marketing Program Manager for Data Warehousing.  Hughes has worked in a variety of Information Technology, Data Warehousing, and Big Data jobs, and has been with IBM since 2004.  Hughes earned a Bachelor’s degree from Kansas University, and a Master’s degree in Computer Science from Kansas State University.  Writing about the original Dream Team, Hughes authored a book on the 1936 US Olympic basketball team, a squad composed of oil refinery laborers and film industry stage hands. You can follow him on @rhughes134