Start Small and Move Fast: The Hybrid Data Warehouse

by Mona Patel

In the world of cutting edge big data analytics, the same obstacles in gaining meaningful insight still exists – ease of getting data in and getting data out.  To address these long standing issues, the utmost flexibility is needed, especially when layered with the agile needs of the business.

Why spend millions of dollars replacing your data and analytics environment with the latest technology promise to address these issues, when can you to leverage existing investments, resources, and skills to achieve the same, and sometimes better, insight?

Consider a hybrid data warehouse.  This approach allows you to start small and move fast. It provides the best of both worlds – flexibility and agility without breaking the bank.  You can RAPIDLY serve up quality data managed by your data warehouse, blended with newer data sources and data types in the cloud, and apply integrated analytics such as Spark or R – all without additional IT resources and expertise.  How is this possible?  IBM dashDB.

Read Aberdeen’s latest report on The Hybrid Data Warehouse.

mona's blog

 

Watch Aberdeen Group’s Webcast on The Hybrid Data Warehouse.

Let me give you an example.  We live in a digital world, with organizations now very interested in improving customer data capture across mobile, web, IoT, social media, and more for newer insights.  A telecommunications client was facing heavy competition and wanted to quickly deliver unique mobile services for an upcoming event in order to acquire new customers by collecting and analyzing mobile and social media data.  Taking a hybrid data warehouse approach, the client was able to start small and move fast, uncovering new mobile service options.

Customer information generated from these newer data sources were blended together with existing customer data managed in the data warehouse to deliver newer insights.  IBM dashDB provided a high performing, public cloud data warehouse service that was up and running in minutes.  Automatic transformation of unstructured geospatial data into structured data, in-memory columnar processing, in-database geospatial analytics, integration with Tableau, and pricing were some of the key reasons IBM dashDB was chosen.

This brings me back to my first point – you don’t have to spend millions of dollars to capitalize on getting data in and getting data out.  For example, clients like the one described above took advantage of Cloudant JSON document store integration, enabling them to rapidly get data into IBM dashDB with ease– no ETL processing required.  Automatic schema discovery loads and replicates unstructured JSON documents that capture IoT, Web and mobile-based data into a structured format.  Getting data or information out was simple, as IBM dashDB provides in-database analytics and the use of familiar, integrated SQL based tools such as Cognos, Watson Analytics, Tableau, and Microstrategy.  I can only conclude that IBM dashDB is a great example of how a highly compatible cloud database can extend or modernize your on-premises data warehouse into a hybrid one to meet time-sensitive business initiatives.

What exactly is a hybrid data warehouse?  A hybrid data warehouse introduces technologies that extend the traditional data warehouse to provide key functionality required to meet new combinations of data, analytics and location, while addressing the following IT challenges:

  • Deliver new analytic services and data sets to meet time-sensitive business initiatives
  • Manage escalating costs due to massive growth in new data sources, analytic capabilities, and users
  • Achieve data warehouse elasticity and agility for ALL business data

mona_dashDB

Still not convinced on the power of a hybrid data warehouse?  Hear what Aberdeen Group’s expert Michael Lock has to say in this 30 min webcast.

About Mona,

mona_headshot

Mona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

IBM BigInsights version 4.2 is here!

Brings Hadoop, Spark and SQL into one flexible, open analytics platform

by Andrea Braida

Today, we are pleased to announce that IBM BigInsights® 4.2 is generally available. BigInsights 4.2 is built on IBM Open Platform (IOP), IBM’s big data platform with Apache Spark and Apache Hadoop. IOP offers the ideal combination of Apache components to support big data applications. The BigInsights 4.2 release puts the full range of analytics for Hadoop, Spark and SQL into the hands of advanced analytics and data science teams on a single platform.

IBM has deep Hadoop expertise, and in the last year, has moved into a very strong Apache Spark leadership position as well. IBM is integrating and embedding Spark across its analytics portfolio, which means that customers get Spark in any way they want it. No one else in the market is doing this today. (BigInsights 4.2 also includes comprehensive machine language support – Spark, SystemML and integration with H2O.)

If a recommended Hadoop distribution is something you’re interested in, the most significant release features, including Spark integration, are summarized for you below.

What’s new in BigInsights 4.2?

BigInsights 4.2 introduces a range of new capabilities that make it more open, flexible and powerful:

Integration with Apache Spark 1.6.1

Access the processing and analytics power of Spark, which includes dramatically speeding up batch and ETL processing times with the Spark Core, near real-time analytics with Spark Streaming, built-in machine learning libraries which are highly extensible using Spark MLlib, querying of unstructured data and more value from free-form text analytics with Spark SQL, and graph computation/graph analytics with Spark GraphX.

IBM Big SQL enhancements for RDBMS offload and consolidation.
Big SQL now understands SQL dialects from other vendors and products, such as Oracle, IBM DB2® and IBM Netezza®, making it the ultimate platform for RDBMS offload and consolidation. It is faster and easier to offload old data from existing enterprise data warehouses or data marts to free up capacity while preserving most of the familiar SQL from those platforms. BigSQL is also the only SQL engine for Hadoop that exploits Hive, HBase, and Spark concurrently for best in class analytic capabilities.

New Apache components and currency updates to existing components
BigInsights 4.2 now includes Apache Ranger, Apache Phoenix and Apache Titan. BigInsights is currently the only Hadoop distribution with Graph Database. Notable currency updates include updates to Ambari, Kafka, and SOLR.

ODPI Runtime Certification
With V4.2, IOP is among the first Hadoop platforms to comply with the Open Data Platform (ODPi) Runtime Certification. This means it is easier for independent software vendors to adopt IOP as a platform, and ensures platform openness for customers.

Introducing IBM Big Replicate
IBM Big Replicate provides continuous availability and data consistency via a patented active-transactional replication technology which also provides streaming backup, hybrid cloud, and burst-to-cloud. This is an optimized data replication capability for uninterrupted migration between different distributions to IBM, cloud to on-prem, and vice versa.

Why should you consider BigInsights 4.2?

Some key standout features for BigInsights 4.2 are BigSQL performance imporvements, deeper analytics with Spark and Graph Database, and a more open and secure platform.

BigSQL performance improvements

BigSQL is the SQL query engine in BigInsights.  New performance improvements make it super fast, and super easy to install and manage. These enhancements to BigSQL 4.2 result in significant performance improvements:

  • Built-in components improve performance with less tuning (auto-analyze)
  • Improved memory management and operational stability
  • High performance transactional support is now included
  • Apache Phoenix provides easier access to Hbase with a SQL interface
  • In Technology Preview, in-memory technology (BLU Acceleration) on Big SQL head nodes is now available for faster processing

These enhancements make BigInsights an ideal platform for RDBMS off-load and consolidation, as well as a hybrid engine that can help you exploit fit-for-purpose Hadoop subsystems.

Deeper and improved analytics with Spark and Graph Database

  • Easier and richer text analytics
  • New AQL Editor makes it easier to migrate existing AQL to V4.2
  • Web-based, drag-and-drop development
  • Powerful, expressive, AQL language to get more done, with less work
  • New run-on-cluster with Spark
  • Pre-built extractors: Named Entity, Financial, Sentiment, Machine Data
  • Graph Database – Titan
  • IOP is the first Hadoop distribution to include a graph database in its distribution

More open and more secure

  • For security, BigInsights 4.2 is compliant with industry standards, and includes Apache Ranger which provides centralized security management and auditing of users and the REST interface. It supports HDFS, YARN, Hive, HBase, and Kafka, allowing users to focus more time on analyzing data versus worrying about security.
  • BigInsights now enables easy product integration with ODPI Runtime Certification. With V4.2, IOP is among the first Hadoop platforms to comply with the Open Data Platform (ODPi) Runtime Certification. This means it is easier for independent software vendors  to adopt IOP as a platform, and it ensures platform openness for clients.

The BigInsights’ core –  IBM Open Platform (IOP) – was designed with a focus on analytics, operational excellence, and security empowerment, and is certified by the Open Data Platform Initiative (ODPi).

Get started free

BigInsights is available on-premises, on-cloud, and is integrated with other systems in use today, with enterprise-class support available. (Please note that BigQuality, BigIntegrate, Phoenix, Ranger, Solr, and Titan are available on BigInsights on-premises only, and are planned for the on-cloud offering.*)

BigInsights is also integrated with a broad and open ecosystem of data and analytics tools, allowing for a true hybrid architecture. BigInsights on Cloud was recently ranked as a leader in the Hadoop Cloud services market by Forrester, which I’ll share more about in my next blog.

BigInsights_Logo

 

Get started with a free version of the BigInsights core, IBM Open Platform (IOP). Click here.

And for more information about the 4.2 release, please visit our release overview or refer to the Big Replicate overview.  Or visit the Hadoop solutions page.

About Andrea,

andrea braida_croppedAndrea Braida is a Portfolio Marketing Manager at IBM for Big Data Analytics and Data Science offerings. A former start-up founder, she has extensive product management, product marketing, and data science marketing experience within both global technology giants and start-ups. Andrea is based in Seattle, Washington.

* The information contained in this presentation is provided for informational purposes only.

While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided “as is”, without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other documentation. Nothing contained in this presentation is intended to, or shall have the effect of: 1) Creating any warranty or representation from IBM (or its affiliates or its or their suppliers and/or licensors); or 2) altering the terms and conditions of the applicable license agreement governing the use of IBM software.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment.  The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multi-programming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed.  Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

IBM dashDB Local preview to be featured at Cloud Expo East 2016 in New York

by Cindy Russell, IBM Data Warehouse Marketing

With an expected attendance of over 6,000 from June 7th through 9th, Cloud Expo New York is the one and only original event in which technology vendors can meet to experience and discuss the entire world of the Cloud.

On June 7, 2016, we are thrilled to be rolling out IBM dashDB Local™ as an open preview.  This exciting new IBM hybrid data warehouse offering is part of the dashDB family of data management solutions. dashDB Local delivers dashDB technology via container technology for implementations such as private and virtual private clouds. It is an ideal solution when you want cloud-like simplicity, yet need more control over applications and data. Participants in this preview program have been very enthusiastic about this technology, and you can read more in Mitesh Shah’s blog.

dashDB Local is:

·        Open: Leverage the power of an open warehouse platform

·        Flexible & Hybrid: Easily deploy the right workload to the right platform

·        Fast: Quickly realize business outcomes with advanced processing technologies

·        Simple: Automated management features help to lower the analytics cost model

 

dashDB Local will also be featured in this speaking session at Cloud Expo!

Building Your Hybrid Data Warehouse Solution with dashDB

Matthias Funke, Hybrid Data Warehouse Offering and Strategy Lead at IBM

June 7, 2016 from 11:40 AM – 12:15 PM

 

See the dashDB family in action at IBM Booth #201, the SoftLayer Booth!

cloud expo booth

Register to attend

IBM Booth #201

June 7, 2016

Cloud Expo East 2016

Javits Center, New York, NY

Note: First time attendees will need to create a new account.

As you respond to increasing requests for new analytics, you need fast and flexible technology. Learn how dash DB’s cloud-based managed service and new container-based edition gives you the speed and flexibility that you need.

Want to get started now with dashDB Local? Learn about and download our preview here: ibm.biz/dashDBLocal

IBM dashDB Local opens its preview for data warehousing on private clouds and more!

by Mitesh Shah

Just like in the story of Goldilocks … you may be looking for modern data warehousing that is “just right.”  Your IT strategy may include cloud and you may like the simplicity and scalability benefits of cloud … yet some data and applications may need to stay on-premises for a variety of reasons.  Traditional data warehouses provide essential analytics, yet they may not be right for new types of analytics, data born on the cloud, or simply cannot contain a growing workload of new requests.

IBM dashDB Local is an open preview technology that is designed to give you “just right” cloud-like simplicity and flexibility.  It delivers a configured data warehouse in a Docker container that you can deploy wherever you need it as long Docker is supported on that infrastructure. Often, this is a private cloud, virtual private cloud (AWS/Azure), or other software-defined infrastructure. You gain management simplicity and have an environment that you can control more directly.

DownloadFromDocker
Download and install dashDB Local quickly and simply via Docker container technology.

dashDB Local may be the right choice when you have complex applications that must be readied for cloud, have SLAs or regulations that require data or applications to stay on premises, or you need to address new analytics requests very quickly with easy scale in and out capabilities.

dashDB Local complements the dashDB data warehouse as a service offering that is delivered via IBM Bluemix. Because both products are based on a common database technology, you can move workloads across these editions without costly and complex application change!   This is one example of how we define a hybrid data warehouse and how it can help improve your flexibility over time as your needs evolve.

Since dashDB Local began its closed preview in February of 2016, the team has rallied to bring in a comprehensive set of data warehousing features to this edition of dashDB. We have been listening to the encouraging feedback from our initial preview participants, and as a result, we now have a solution that is open for you to test!

So what are you waiting for?

It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes.

Ours early adopters have been fascinated by the power and ease of deployment for the Docker container.  It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes. One client said that dashDB Local is as easy to deploy and manage as a mobile app! We are thrilled by this type of feedback!

Workload monitoring in dashDB Local delivers elasticity to scale out or in.
Workload monitoring in dashDB Local delivers elasticity to scale out or in.

The open preview (v. 0.5.0) offers extreme scale out and scale in capabilities. Yes, you heard me right. Scale-in provides the elasticity to not tie up your valuable resources beyond the peak workloads. This maximizes return on investment for your variable reporting and analytics solutions.  The open preview will help you test drive the Netezza compatibility (IBM PureData System for Analytics) within dashDB technology, as well as analytics support using RStudio. Automated High Availability is another attractive feature that is provided out-of-the box for you to see and test.

Preview participants have been eager to test drive query performance. One participant says, “We are very impressed with the performance, and within no time we have grown our dataset of 40 million to 200 million records (a few TBs) and the analytics test queries run effortless.” Our participants are leveraging their data center infrastructure whether it’s bare metal or virtualized (VMs) to get started and some have installed it on their laptops to quickly gain an understanding of this preview.

Register for dashDB Local previewFind out how it can be “just right” for you!  Go here to give it a try and get ready to be wowed.  We value and need your feedback to help us prioritize features that are important to your business.  All the best and don’t hesitate to drop me a line to let me know what you think!


About Mitesh,

MiteshMitesh Shah is the product manager for the new dashDB Local data warehousing solution as a software-defined environment (SDE) that can be used on private clouds and platforms that support Docker container technology. He has broad experience around various facets of software development revolving around relational databases and data warehousing technologies.  Throughout his career, Mitesh has enjoyed a focus on helping clients address their data management and solution architecture needs.

How dashDB Helps Media Channels Boost Revenues And Viewership

By Harsimran Singh Labana

Did you ever wonder how a media channel decides which ad comes at what time? Well, there is an analytics science behind this.

Cable and broadcast networks pay studios large sums of money for the right to broadcast a specific show or movie at specific times on specific channels. To achieve a return on that investment, networks must design TV schedules and promotional campaigns to maximize viewership and boost advertising revenues.

RSG Media is an IBM dashDB managed service client that partners with cable and broadcast, entertainment, games and publishing firms to provide insights that help maximize revenue from content, advertising and marketing inventories. Shiv Sehgal, Solutions Architect, RSG Media says, “We had the rights data, the scheduling data and the advertising revenues data. If we could combine this with viewership and social media data, we could give our clients a true 360-degree view of their operations and profitability, down to the level of individual broadcasts. The missing piece of the puzzle was to build a data and analytics capability that could bring all the data together and turn it into business insight – and that’s where IBM came in.”

RSG Media chose IBM because of its complete vision for cloud analytics. This includes an integrated set of solutions for building advanced analytics applications and coordinating them with all the relevant data services in the cloud.

RSG Media’s Big Knowledge Platform is built on the IBM® Cloudant® NoSQL document store and the IBM dashDB™ data warehouse service, orchestrated through the IBM Bluemix® cloud application development platform. Cloudant’s Schema Discovery Process (SDP) is used to ingest and translate semi-structured data from more than 50 sources, and structure that data into a schema that the dashDB relational data warehouse understands.

RSG Media is not stopping here and they are excited about Watson Analytics and how it predicts customer behavior.  Learn more about RSG Media success using dashDB and Cloudant solutions on Bluemix.

About Harsimran,
HarryHarsimran Singh Labana is the Portfolio Marketing Manager for IBM’s Data Warehousing team. Working in a worldwide role he ensures marketing support for IBM’s solutions. He has been with IBM for close to five years working in diverse roles like sales and social media marketing. He stays in Bangalore, India with his wife and son.

IBM Fluid Query 1.7 is Here!

by Doug Dailey

IBM Fluid Query offers a wide range of capabilities to help your business adapt to a hybrid data architecture and more importantly it helps you bridge across “data silos” for deeper insights that leverage more data.   Fluid Query is a standard entitlement included with the Netezza Platform Software suite for PureData for Analytics (formerly Netezza). Fluid Query release 1.7 is now available, and you can learn more about its features below.

Why should you consider Fluid Query?

It offers many possible uses for solving business problems in your business. Here are a few ideas:
• Discover and explore “Day Zero” data landing in your Hadoop environment
• Query data from multiple cross-enterprise repositories to understand relationships
• Access structured data from common sources like Oracle, SQL Server, MySQL, and PostgreSQL
• Query historical data on Hadoop via Hive, BigInsights Big SQL or Impala
• Derive relationships between data residing on Hadoop, the cloud and on-premises
• Offload colder data from PureData System for Analytics to Hadoop to free capacity
• Drive business continuity through low fidelity disaster recovery solution on Hadoop
• Backup your database or a subset of data to Hadoop in an immutable format
• Incrementally feed analytics side-cars residing on Hadoop with dimensional data

By far, the most prominent use for Fluid Query for a data warehouse administrator is that of warehouse augmentation, capacity relief and replicating analytics side-cars for analysts and scientists.

New: Hadoop connector support for Hadoop file formats to increase flexibility

IBM Fluid Query 1.7 ushers in greater flexibility for Hadoop users with support for popular file formats typically used with HDFS.Fluid query 1.7 connector picture These include popular data storage formats like AVRO, Parquet, ORC and RC that are often used to manage bigdata in a Hadoop environment.

Choosing the best format and compression mode can result in drastic differences in performance and storage on disk. A file format that doesn’t support flexible schema evolution can result in a processing penalty when making simple changes to a table. Let’s just  say that if you live in the Hadoop domain, you know exactly what I am speaking of. For instance, if you want to use AVRO, do your tools have readers and writers that are compatible? If you are using IMPALA, do you know that it doesn’t support ORC, or that Hortonworks and Hive-Stinger don’t play well with Parquet? Double check your needs and tool sets before diving into these popular format types.

By providing support for these popular formats,  Fluid Query allows you to import, store, and access this data through local tools and utilities on HDFS. But here is where it gets interesting in Fluid Query 1.7: you can also query data in these formats through the Hadoop connector provided with IBM Fluid Query, without any change to your SQL!

New: Robust connector templates

In addition, Fluid Query 1.7 now makes available a more robust set of connector templates that are designed to help you jump start use of Fluid Query. You may recall we provided support for a generic connector in our prior release that allows you to configure and connect to any structured data store via JDBC. We are offering pre-defined templates with the 1.7 release so you can get up and running more quickly. In cases where there are differences in user data type mapping, we also provide mapping files to simplify access.  If you have your own favorite database, you can use our generic connector, along with any of the provided templates as a basis for building a new connector for your specific needs. There are templates for Oracle, Teradata, SQL Server, MySQL, PostgreSQL, Informix, and MapR for Hive.

Again, the primary focus for Fluid Query is to deliver open data access across your ecosystem. Whether the data resides on disk, in-memory, in the Cloud or on Hadoop, we strive to enable your business to be open for data. We recognize that you are up against significant challenges in meeting demands of the business and marketplace, with one of the top priorities around access and federation.

New: Data movement advances

Moving data is not the best choice. Businesses spend quite a bit of effort ingesting data, staging the data, scrubbing, prepping and scoring the data for consumption for business users. This is costly process. As we move closer and closer to virtualization, the goal is to move the smallest amount of data possible, while you access and query only the data you need. So not only is access paramount, but your knowledge of the data in your environment is crucial to efficiently using it.

Fluid Query does offer data movement capability through what we call Fast Data Movement. Focusing on the pipe between PDA and Hadoop, we offer a high speed transfer tool that allows you to transfer data between these two environments very efficiently and securely. You have control over the security, compression, format and where clause (DB, table, filtered data). A key benefit is our ability to transfer data in our proprietary binary format. This enables orders of magnitude performance over Sqoop, when you do have to move data.

Fluid Query 1.7 also offers some additional benefits:
• Kerberos support for our generic database connector
• Support for BigInsights Big SQL during import (automatically synchronizes Hive and Big SQL on import)
• Varchar and String mapping improvements
• Import of nz.fq.table parameter now supports a combination of multiple schemas and tables
• Improved date handling
• Improved validation for NPS and Hadoop environment (connectors and import/export)
• Support for BigInsights 4.1 and Cloudera 5.5.1
• A new Best Practices User Guide, plus two new Tutorials

You can download this from IBM’s Fix Central or the Netezza Developer’s Network for use with the Netezza Emulator through our non-warranted software.

Picture1

Take a test drive today!

About Doug,
Doug Daily
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

What you need to know: Software-Defined Environments (SDE) for data warehousing and more

by James Cho and Maria Attarian

There’s a new kid on the block and it’s called SDE!  This is a new term that stands for Software Defined Environment (SDE), and it is here to change the way we think about the world of application, integration and middleware – as well as data warehouses. But first things first.  Let’s talk about the SDE and how it can help you as you deliver more end-user services more easily.

What is an SDE and why use one when you have traditional environment approaches?

Put simply, a Software Defined Environment (SDE) optimizes the entire computing infrastructure — compute, storage and network resources. An SDE can automatically tailor itself to meet the needs of the workload that must be executed.

In comparison to traditional environment approaches, compute, storage, and network resources are allocated and assigned to workloads manually and this is the problem from which the need for the SDE technology emerged. In order to remove manual steps, SDEs takes into account application characteristics, best-available resources and service level policies when dynamically allocating resources to workloads. An SDE also strives to deliver continuous, “on the fly” optimization and reconfiguration to address infrastructure issues.

So what are the fundamental ingredients to doing this? Policy-based compliance checks and updates are essential and make an SDE easy to manage. Delivering in the public or private cloud requires high-speed analytic processing capabilities, as well as rapid integration, automation and optimization. When factors such as these are in place, it becomes clear that SDE technology helps accelerate business success and brings value to the customer because the solution is responsive and adaptive.

So what does IBM have to offer in the SDE space?

dashDB Local (currently in preview) is the IBM data warehouse offering for SDEs such as private clouds, virtual private clouds and other infrastructures that support the Docker container technology. It is designed to provision a full data warehouse stack in minutes and helps you manage the service in your own public or private cloud, while maintaining existing operational and security processes.

There are three design principles that dashDB Local tackles head-on based on feedback from our customers.

  1. So simple that anyone can deploy it

By packaging our software stack into a Docker container, provisioning dashDB Local can be as simple as one docker run command on Linux servers that have the Docker engine installed. It can be as easy as a Docker hub search for “dashDB” followed by a single click on “CREATE” using Docker kitematic on Windows or Mac machines. Software stack updates are as simple as your mobile app using the same docker run command against a new version of the container on your existing installation.

  1. Flexible enough to deploy anywhere

dashDB Local can be deployed on any supported Docker installations on Linux, Cloud, and on OSX and Windows platforms with minimal prerequisites. Entry level hardware requirements start at 8GB RAM and 20GB of storage, which is suitable for a development / test environment or QA work on your laptop. For larger servers like 48 core 3 TB RAM servers, the dashDB container will auto-configure to the host it is installed on. Persistent durable storage of your choice must be mounted in /mnt/clusterfs to hold your data. To summarize, it is flexible enough to empower you to use the hardware what you already have in your data center or in the public cloud of your choice.

  1. Independent of your infrastructure capabilities

Existing monitoring and security overlays can remain on your Host OS while the dashDB stack is isolated inside its container. You can fully utilize existing infrastructure capabilities like copy and replication services of your storage. Existing monitoring tools such as systems management, network monitoring, even popular cloud management tools such as openstack, kubernetes, or public cloud monitoring tools like AWS Cloudwatch can continue to be used. The isolation of a dashDB Local container allows you to embrace your own data center standards. Thus, it is independent and empowers you to do what you already know how to do.

For more information on dashDB Local, please visit the public Docker repository. An early access preview of dashDB Local is now available. Test it out and help shape the solution. Test it for yourself.  Request dashDB Local preview access here.

About James and Maria,

James ChoJames Cho is a Senior Technical Staff Member and  Chief Architect for IBM dashDB Local. He has been a technical leader of integrated warehouse solutions and appliances at IBM for over 15 years. He currently focuses on data warehouse solutions delivered in public and private cloud data centers. His previous experience includes Data Warehouse DBA, publication of Industry standard TPC performance benchmarks, and BI architecture and deployment responsibilities. James is a 1996 graduate of the University of Texas.  He holds a bachelor of science in computer science.

Follow James on LinkedIN

 maria attarianMaria Attarian has worked for IBM for the past four years on data warehousing technologies such as PureData System for Analytics and dashDB. In her focus as a software engineer, Maria is charged with new and innovative ways to deliver better products for clients and takes on a variety of challenges in this role. Maria is also active with the IEEE Young Professionals Toronto Chapter.  She served as the chairperson of this group for more than a year and remains an active member of the group.  Maria hold a master’s degree from the University of Waterloo and a bachelor of engineering degree from the National Technical University of Athens in Greece.