IBM dashDB Local FAQ

When it comes to next-generation data warehousing and data management, the IBM dashDB family offers a range of options that all share a common technology.  dashDB Local is one of the new offerings and this blog provides a series of answers to your questions!  Please feel free to jump in at the bottom in the comments to add your own questions and we will respond.

      1.  What is the dashDB family?
        IBM dashDB is a family of next generation of database and data warehouse technologies that help you respond very quickly to application needs.  Originally in data warehousing, IT professionals assembled hardware software and storage to handle their large data sets for analytics needs.  This was risky, costly and time consuming.  This gave way to the data warehouse appliance that provided an optimized system for data warehousing and analytics.  The appliance was so successful that many consider it to be the backbone of their analytics architecture.But the world of analytics is expanding and new technologies are needed to handle more requests, more data sources and even self-service needs.  Hybrid data architectures are coming to the forefront to handle these increased needs. The dashDB family plays a key role here:

        • dashDB for analytics – as a fully managed cloud data warehouse
        • dashDB Local – as a configured data warehouse delivered via container technology to enable flexible deployment
        • dashDB for transactions – as a fully managed database as a service for transactional workloads.

        This family is designed to help you respond to new needs very quickly.  It also shares a common engine to help you leverage the same skills across different deployment models and application types. For more information on dashDB, visit dashDB.com.

      2. What is dashDB Local?
        dashDB Local is in-memory, columnar data warehousing software, supporting wide range of analytic workloads—from datamarts to enterprise data warehouses. It is deployed using Docker container technology, supporting a software defined environment such as private cloud, virtual private cloud or infrastructure of your choice, thus enabling hybrid Cloud configuration. dashDB Local can be deployed in minutes—making it fast and easy to deliver an auto configured data warehouse with built-in Netezza and Oracle compatibility.
      3. What is Docker?
        Docker is container technology that simplifies packaging and distribution of the software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries – anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.
      4. What is the difference between Docker and VMware?
        VMWare is virtualization technology, while Docker is the container technology used for simplified packaging and distribution of software.   While Docker Container provides operating system-level process isolation, VMware virtualization lets you run multiple virtual machines (VMs) on a single physical server (thus providing H/W level abstraction). Unlike VMware, Docker does not create an entire virtual operating system (thus making it lightweight to deploy and faster to start up compared to VMs). Both technologies can be used together, for example, Docker containers can be created inside VMs to make a solution ultra-portable.
      5. Is dashDB Local generally available?
        Yes it is. You can find a free trial of it at ibm.biz/dashDBLocal.
      6. On which platforms is dashDB Local supported?
        Today, dashDB Local runs on any platform that Docker engine is supported such as Linux, Microsoft Windows, Apple Macintosh, and Cloud providers. More details can be found here and well as on the Docker site.
      7. On what platforms is dashDB Local supported?
        dashDB Local software is packaged and deployed using Docker container technology. Thus, dashDB Local can be installed on any platform where Docker engine client is supported. This includes Windows, Macintosh and variety of Linux platforms. Deploying IBM dashDB Local on Windows or Macintosh requires a Linux VM, in which you run Docker. By downloading the Docker Toolbox, which includes a GUI and a VM, you can accomplish this easily. Please refer to the dashDB Local knowledge center (documentation) for further details.
      8.  Is dashDB Local available on IBM Bluemix Local?
        Bluemix Local is managed by IBM on a clients’ infrastructure. There are no plans to offer dashdB Local on Bluemix Local at this time.
      9.  How long does it take to install dashDB Local?
        dashDB Local is based on Docker container technology, which allows setup and installation in less than 30 minutes. Today, the SMP version or MPP version of dashDB Local can be installed in less than 15 minutes.  This can be done on a range of servers from a simple laptop to a production-grade server. Since various components, such as LDAP security and DSM monitoring, are already bundled into a single container installation, there is a tremendous time savings and a more streamlined and efficient process.
      10. What  components are packaged and installed with  dashDB Local?
        dashDB Local is comprised of the following software components packaged in the Docker container, thus simplifying deployment and speeding up the overall setup. At the core is the dashDB analytics engine, tuned for columnar and in-memory workloads, built-in Netezza and Oracle compatibility, an LDAP server for user access management.  IBM Data Server Manager, which acts as the key monitoring component, is also included to provide key features such as query history monitoring, database performance monitoring and OS level monitoring.
      11. Is dashDB local available outside of Docker public hub?
        Soon, a standalone version of dashDB Local will be available on IBM Fix Central and will  leverage the Docker download command on the host OS. This will remove the dependency of pulling the dashDB Local image from the private, access-controlled repository on the public Docker hub. This will also alleviate challenges where a firewall port to access the public docker hub cannot be opened.
      12. What are the minimum prerequisites to install dashDB Local?
        You can find the documented prerequisites for dashDB Local in the Knowledge Center.  Some of the key requirements focus on Docker client and POSIX-compliant storage files systems, as documented. An additional key requirement revolves around opening access to the following network ports or on the firewall. Ensure that the following ports are opened and defined on all nodes in the cluster and defined in each node’s /etc/hosts file.60000-60024, for database FCM
        25000-25999, for Apache Spark
        50022, for SSH/container OS
        50001, for database connection with SSL
        50000, for database connection without SSL
        9929, for communication tests
        9300, for web console status
        8443, for web console HTTPS
        5000, for System Manager
        389, for LDAP
        22, for SSH/host OS
      13. How does Apache Spark fit into the dashDB Local architecture?
        dashDB Local lets you dramatically modernize your data warehouse solutions with advanced analytics based on Spark. It is installed and configured within the dashDB Local container, thus making it fully integrated, supporting a variety of use cases.Spark applications that process relational data can gain significant performance and operational QoS benefits from deploying and running inside dashDB Local. It enables end-to-end analytic solution creation, from interactive exploration and machine learning experiments; verification of analytic flows; easy operationalization of Spark applications through to hosting Spark applications in a multi-tenant enterprise warehouse system; and integration of Spark applications with other applications via various invocation APIs. It allows you to invoke Spark logic via SQL connections and can land streaming data directly into tables via deployed Spark applications. It can run complex data transformations and feature extractions that cannot be expressed with SQL using integrated Spark.
      14. I have a running dashdb local instance where I’ve set DISABLE_SPARK=‘YES’ in the options file. How can I tell if this option took affect and the actual memory the db is using?This can be confirmed during the startup of dashDB Local container. When Spark is disabled, you will see in the docker start output “Spark support is going to be disabled.” When Spark is enabled you will see”Current spark share : XX% of total memory. “
      15. Is the SPARK setting available as an install time switch, or can I enable and disable spark every time I start the database by changing this?
        The SPARK feature can be enabled/disabled any time. You can change the DISABLE_SPARK setting anytime again and just restart dashDB local in the container (docker exec -it dashdb stop/start).
      16. Is there currently a maximum number of nodes for dashDB Local?
        Currently, one can create up to 24 node MPP cluster in dashDB Local.
      17. Can I upgrade from the GA trial to the production version of dashDB Local?
        Yes, you can upgrade/update the post-GA trial version to a production grade version. Once the product is purchased, a permanent license can be applied to update the license.
      18. How can I get support for dashDB Local ?
        Upon purchase of dashDB Local, clients are entitled to support via email or phone. One can open a PMR with the IBMs support ticketing system. IBM will support the dashDB Local container and all of the components inside it. For Docker specific issues, clients should contact the Docker support team. The IBM support team will assist in identifying the problem and advise accordingly.
      19. If I run into docker issue, would IBM support handle it for me?
        IBM will support the “dashDB Local container”, but not the Docker engine itself. It is the client’s responsibility to subscribe to Docker support.  Customers can leverage Docker CS engine (from Docker) or Open source docker RPMs that come with the Linux distros (such as Red Hat). Docker will provide commercial support only for Docker CS engine and not for Docker RPMs and it is the customer choice of a relevant support path regarding docker components.
      20. How often are dashDB Local updates/fix packs made available?
        dashDB Local is based on an agile development cloud model and the intent is to roll out container updates frequently. This will not only make it easier to stay current with the latest bug fixes, and newest features. The updates are handled via container update process and will take less then 30 minutes, similar to the container setup.
      21. Can dashDB Local be installed on Amazon AWS or Azure/ AWS ?
        dashDB Local can be installed on the infrastructure of your choice, as long as that infrastructure supports Docker container technology. dashDB Local is client-managed and can also be installed in your data center and any virtual private cloud infrastructures such as Amazon AWS EC2 or Microsoft Azure platforms.
      22. What kind of storage is required for dashDB Local?
         dashDB Local requires a posix-compliant clustered file storage system. This is applicable for the MPP cluster only. For a standalone SMP installation, this is not a requirement. You can use standard local disks for a SMP dashDB local node setup.Cluster file system is a file system that is configured in a way to group servers and resources together to have concurrent access to a single file system. The key to a cluster file system is that the cluster appears as a single highly available system to all the end users. This increases the storage utilization rate and can result in high performance.
        Some common examples of clustered file storage system are :
        – VERITAS Cluster File System(VxFS) Sun Solaris, HP/UX
        – Generalized Parallel File System (GPFS) IBM AIX, Linux
        – GFS2 Red Hat only
      23. Is Oracle compatibility available in dashDB local?
        Yes, Oracle compatibility is supported in dashDB Local. You can enable applications that were written for an Oracle database to use dashDB™ Local without having to be rewritten. To use this capability, you must specify that dashDB Local is to run in Oracle compatibility mode prior to initial deployment.Before you begin, the /mnt/clusterfs/ directory must already be created. To perform this task, you need to have root authority on the host system OS. By default, Oracle compatibility mode is not enabled. To enable it, you explicitly make an entry in the /mnt/clusterfs/options file prior to deploying dashDB Local. Run the command below and then follow the normal steps around container deployment/initializationecho “ENABLE_ORACLE_COMPATIBILITY=’YES'” >> /mnt/clusterfs/options

        For more details:
        http://www.ibm.com/support/knowledgecenter/SS6NHC/com.ibm.swg.im.dashdb.doc/admin/local_oracompat.html
        http://www.ibm.com/support/knowledgecenter/SS6NHC/com.ibm.swg.im.dashdb.doc/admin/local_setup.html#setup

 

IBM dashDB Local opens its preview for data warehousing on private clouds and more!

by Mitesh Shah

Just like in the story of Goldilocks … you may be looking for modern data warehousing that is “just right.”  Your IT strategy may include cloud and you may like the simplicity and scalability benefits of cloud … yet some data and applications may need to stay on-premises for a variety of reasons.  Traditional data warehouses provide essential analytics, yet they may not be right for new types of analytics, data born on the cloud, or simply cannot contain a growing workload of new requests.

IBM dashDB Local is an open preview technology that is designed to give you “just right” cloud-like simplicity and flexibility.  It delivers a configured data warehouse in a Docker container that you can deploy wherever you need it as long Docker is supported on that infrastructure. Often, this is a private cloud, virtual private cloud (AWS/Azure), or other software-defined infrastructure. You gain management simplicity and have an environment that you can control more directly.

DownloadFromDocker
Download and install dashDB Local quickly and simply via Docker container technology.

dashDB Local may be the right choice when you have complex applications that must be readied for cloud, have SLAs or regulations that require data or applications to stay on premises, or you need to address new analytics requests very quickly with easy scale in and out capabilities.

dashDB Local complements the dashDB data warehouse as a service offering that is delivered via IBM Bluemix. Because both products are based on a common database technology, you can move workloads across these editions without costly and complex application change!   This is one example of how we define a hybrid data warehouse and how it can help improve your flexibility over time as your needs evolve.

Since dashDB Local began its closed preview in February of 2016, the team has rallied to bring in a comprehensive set of data warehousing features to this edition of dashDB. We have been listening to the encouraging feedback from our initial preview participants, and as a result, we now have a solution that is open for you to test!

So what are you waiting for?

It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes.

Ours early adopters have been fascinated by the power and ease of deployment for the Docker container.  It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes. One client said that dashDB Local is as easy to deploy and manage as a mobile app! We are thrilled by this type of feedback!

Workload monitoring in dashDB Local delivers elasticity to scale out or in.
Workload monitoring in dashDB Local delivers elasticity to scale out or in.

The open preview (v. 0.5.0) offers extreme scale out and scale in capabilities. Yes, you heard me right. Scale-in provides the elasticity to not tie up your valuable resources beyond the peak workloads. This maximizes return on investment for your variable reporting and analytics solutions.  The open preview will help you test drive the Netezza compatibility (IBM PureData System for Analytics) within dashDB technology, as well as analytics support using RStudio. Automated High Availability is another attractive feature that is provided out-of-the box for you to see and test.

Preview participants have been eager to test drive query performance. One participant says, “We are very impressed with the performance, and within no time we have grown our dataset of 40 million to 200 million records (a few TBs) and the analytics test queries run effortless.” Our participants are leveraging their data center infrastructure whether it’s bare metal or virtualized (VMs) to get started and some have installed it on their laptops to quickly gain an understanding of this preview.

Register for dashDB Local previewFind out how it can be “just right” for you!  Go here to give it a try and get ready to be wowed.  We value and need your feedback to help us prioritize features that are important to your business.  All the best and don’t hesitate to drop me a line to let me know what you think!


About Mitesh,

MiteshMitesh Shah is the product manager for the new dashDB Local data warehousing solution as a software-defined environment (SDE) that can be used on private clouds and platforms that support Docker container technology. He has broad experience around various facets of software development revolving around relational databases and data warehousing technologies.  Throughout his career, Mitesh has enjoyed a focus on helping clients address their data management and solution architecture needs.

What you need to know: Software-Defined Environments (SDE) for data warehousing and more

by James Cho and Maria Attarian

There’s a new kid on the block and it’s called SDE!  This is a new term that stands for Software Defined Environment (SDE), and it is here to change the way we think about the world of application, integration and middleware – as well as data warehouses. But first things first.  Let’s talk about the SDE and how it can help you as you deliver more end-user services more easily.

What is an SDE and why use one when you have traditional environment approaches?

Put simply, a Software Defined Environment (SDE) optimizes the entire computing infrastructure — compute, storage and network resources. An SDE can automatically tailor itself to meet the needs of the workload that must be executed.

In comparison to traditional environment approaches, compute, storage, and network resources are allocated and assigned to workloads manually and this is the problem from which the need for the SDE technology emerged. In order to remove manual steps, SDEs takes into account application characteristics, best-available resources and service level policies when dynamically allocating resources to workloads. An SDE also strives to deliver continuous, “on the fly” optimization and reconfiguration to address infrastructure issues.

So what are the fundamental ingredients to doing this? Policy-based compliance checks and updates are essential and make an SDE easy to manage. Delivering in the public or private cloud requires high-speed analytic processing capabilities, as well as rapid integration, automation and optimization. When factors such as these are in place, it becomes clear that SDE technology helps accelerate business success and brings value to the customer because the solution is responsive and adaptive.

So what does IBM have to offer in the SDE space?

dashDB Local (currently in preview) is the IBM data warehouse offering for SDEs such as private clouds, virtual private clouds and other infrastructures that support the Docker container technology. It is designed to provision a full data warehouse stack in minutes and helps you manage the service in your own public or private cloud, while maintaining existing operational and security processes.

There are three design principles that dashDB Local tackles head-on based on feedback from our customers.

  1. So simple that anyone can deploy it

By packaging our software stack into a Docker container, provisioning dashDB Local can be as simple as one docker run command on Linux servers that have the Docker engine installed. It can be as easy as a Docker hub search for “dashDB” followed by a single click on “CREATE” using Docker kitematic on Windows or Mac machines. Software stack updates are as simple as your mobile app using the same docker run command against a new version of the container on your existing installation.

  1. Flexible enough to deploy anywhere

dashDB Local can be deployed on any supported Docker installations on Linux, Cloud, and on OSX and Windows platforms with minimal prerequisites. Entry level hardware requirements start at 8GB RAM and 20GB of storage, which is suitable for a development / test environment or QA work on your laptop. For larger servers like 48 core 3 TB RAM servers, the dashDB container will auto-configure to the host it is installed on. Persistent durable storage of your choice must be mounted in /mnt/clusterfs to hold your data. To summarize, it is flexible enough to empower you to use the hardware what you already have in your data center or in the public cloud of your choice.

  1. Independent of your infrastructure capabilities

Existing monitoring and security overlays can remain on your Host OS while the dashDB stack is isolated inside its container. You can fully utilize existing infrastructure capabilities like copy and replication services of your storage. Existing monitoring tools such as systems management, network monitoring, even popular cloud management tools such as openstack, kubernetes, or public cloud monitoring tools like AWS Cloudwatch can continue to be used. The isolation of a dashDB Local container allows you to embrace your own data center standards. Thus, it is independent and empowers you to do what you already know how to do.

For more information on dashDB Local, please visit the public Docker repository. An early access preview of dashDB Local is now available. Test it out and help shape the solution. Test it for yourself.  Request dashDB Local preview access here.

About James and Maria,

James ChoJames Cho is a Senior Technical Staff Member and  Chief Architect for IBM dashDB Local. He has been a technical leader of integrated warehouse solutions and appliances at IBM for over 15 years. He currently focuses on data warehouse solutions delivered in public and private cloud data centers. His previous experience includes Data Warehouse DBA, publication of Industry standard TPC performance benchmarks, and BI architecture and deployment responsibilities. James is a 1996 graduate of the University of Texas.  He holds a bachelor of science in computer science.

Follow James on LinkedIN

 maria attarianMaria Attarian has worked for IBM for the past four years on data warehousing technologies such as PureData System for Analytics and dashDB. In her focus as a software engineer, Maria is charged with new and innovative ways to deliver better products for clients and takes on a variety of challenges in this role. Maria is also active with the IEEE Young Professionals Toronto Chapter.  She served as the chairperson of this group for more than a year and remains an active member of the group.  Maria hold a master’s degree from the University of Waterloo and a bachelor of engineering degree from the National Technical University of Athens in Greece.

Hybrid data warehouse architecture: many choices for full flexibility

by Matthias Funke

matthias blog pictureI admit, I love cars. And as a car enthusiast, I cannot imagine not having my own car. I use it every day, I rely on it. I feel at home when I enter it. But I accept that other people may be different. Some are not as attached, some can’t drive or don’t need a car often enough to justify the purchase. For them, using a cab or car service might be the better choice. And then there are people who need flexibility; they need a pick-up truck one day, a van the next day, and a sports car on the weekend. In short, they want full flexibility.

What does all of this have to do with IT and Data Warehousing? Well, at IBM, we think most of our clients have similar, diverse needs when it comes to their data warehouse environment. Depending on the use case at hand, one of several different data warehouse form factors may be better than the others for a particular analytics workload at that time.

Depending on the use case, one of several different data warehouse form factors may be better than the others for a particular analytics workload at that time.

Should it be hosted, vendor-managed, or do I want complete in-house control? Do I need full flexibility regarding the service levels I set for my clients, or is it sufficient to work within the distinct configurations that a service or vendor provides? Behind the scenes, all of this directly impacts the combination of compute versus storage resources I want to have to deliver the right level of  flexibility in the most cost-effective way.

No longer is data warehousing a one size fits all approach.  You need to weigh factors like the service level and the importance of meeting it, amount of flexibility you need, cost of the solution and the amount of control that each of your analytics workloads requires.  In line with this, we  see demand for three distinct data warehouse form factors:

  • Managed cloud service – A vendor-managed public cloud service is the most simple to use as it requires no system administrator on the client side. It is easiest to engage with because you can instantiate a service very quickly, paying for what you use at the moment (“pay-as-you-go”).
  • Predictable, high performance appliance – A client-managed data warehouse appliance offers the best predictability and performance due to its balanced, optimized software and system stack (including hardware), and the best price-performance when use cases require long-term, high utilization of a warehouse. Depending on client skills and effort, the appliance might offer the best simplicity and management, as well as lower TCO.
  •  Software-defined or private cloud software – A client-managed data warehouse service that would run on either your infrastructure, or a hosted IaaS (think Softlayer) is a third option. Use it when you want to increase utilization of existing infrastructure investments and when you need full long-term flexibility to adjust the service depending on the analytics use case and the LoB demand for the analytics. As I stated above, adjusting service levels means  you need the control and flexibility to adjust the combinations of compute  and storage resources to meet current needs. In this scenario, you have control and management of the infrastructure, and  you can enjoy appliance-like simplicity of the data warehouse while still being able to manage it yourself.

Now what if you want to use each of the above form factors in differing combinations to meet a variety of needs?  What if you could choose the best form factor for each workload at that moment in time? Integration across instances of each form factor could enable you to load or replicate data, or to abstract users and applications from the physical layout of your data stores. This becomes a critical success factor in building logical, hybrid data warehouse solutions that offer best flexibility and agility for the business at the lowest cost plus the ability to marry fit-for-purpose data stores, structured and unstructured, into the overall architecture.

No longer is data warehousing a one size fits all approach.  You need to weigh factors like the service level and the importance of meeting it, amount of flexibility you need, cost of the solution and the amount of control that each of your analytics workloads requires.

If you follow IBM, you know that we just launched a Preview Program for our IBM dashDB Local as a software-defined environment (SDE) data warehouse deployment option.  It addresses the needs of the software-defined / private cloud form factor above and it complements the IBM PureData System for Analytics appliance and the dashDB managed cloud datawarehouse service we already offer. Take this new preview for a test drive and tell us what you think so together we can shape the hybrid data warehouse architecture of the future.

About Matthias,

Matthias Funke_headshot Matthias is the worldwide leader of the IBM Data Warehouse product line and strategy. He is passionate about data as the “new currency” and looks for new ways to deliver insights from this data. Matthias brings many years of technology experience to his role including product management, software development and leading software development teams.

Follow Matthias on LinkedIn