Why Are Customers Architecting Hybrid Data Warehouses?

By Mona Patel

As a leader in IT, you may be  incented or mandated to explore cloud and big data solutions to transform rigid data warehousing environments into agile ones to match how the business really wants to operate.  The following questions must come to mind:

  • How do I integrate new analytic capabilities and data sets to my current on-premises data warehouse environment?
  • How do I deliver self service solutions to accelerate the analytic process?
  • How do I leverage commodity hardware to lower costs?

For these questions, and more, organizations are architecting hybrid data warehouses.  In fact, these organizations moving towards hybrid are referred to as ‘Best In Class’ according to The Aberdeen Group’s latest research: “Best In Class focus on hybridity, both in their data infrastructure and with their analytical tools as well.  Given the substantial investments companies have made in their IT environment, a hybrid approach allows them to utilize these investments to the best of their ability while explore more flexible and scalable cloud-based solutions as well.”  To hear more about these ‘Best In Class’ organizations, watch the 45 minute webcast.

How do you get to this hybrid data warehouse architecture with the least risk and most reward?  IBM dashDB delivers the most flexible, cloud database services to extend and integrate with your current analytics and data warehouse environment, addressing all the challenges related to leveraging new sources of customer, product, and operational insights to build new applications, products, and business models.

To help our clients evaluate hybrid data warehouse solutions, Harvard Research Group (HRG) provides an assessment of IBM dashDB.  In this paper, HRG highlights product functionality, as well as 3 uses cases in Healthcare, Oil and Gas, and Financial Services.   Security, Performance, High Availability, In-Database Analytics, and more are covered in the paper to ensure future architecture enhancements optimize IT rather than adding new skills, complexities, and integration costs. After reading this paper, you will find that dashDB enables IT to respond rapidly to the needs of the business, keep systems running smoothly, and achieve faster ROI.

To know more on dashDB check out the video below:

 

About Mona,

mona_headshotMona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

Start Small and Move Fast: The Hybrid Data Warehouse

by Mona Patel

In the world of cutting edge big data analytics, the same obstacles in gaining meaningful insight still exists – ease of getting data in and getting data out.  To address these long standing issues, the utmost flexibility is needed, especially when layered with the agile needs of the business.

Why spend millions of dollars replacing your data and analytics environment with the latest technology promise to address these issues, when can you to leverage existing investments, resources, and skills to achieve the same, and sometimes better, insight?

Consider a hybrid data warehouse.  This approach allows you to start small and move fast. It provides the best of both worlds – flexibility and agility without breaking the bank.  You can RAPIDLY serve up quality data managed by your data warehouse, blended with newer data sources and data types in the cloud, and apply integrated analytics such as Spark or R – all without additional IT resources and expertise.  How is this possible?  IBM dashDB.

Read Aberdeen’s latest report on The Hybrid Data Warehouse.

mona's blog

 

Watch Aberdeen Group’s Webcast on The Hybrid Data Warehouse.

Let me give you an example.  We live in a digital world, with organizations now very interested in improving customer data capture across mobile, web, IoT, social media, and more for newer insights.  A telecommunications client was facing heavy competition and wanted to quickly deliver unique mobile services for an upcoming event in order to acquire new customers by collecting and analyzing mobile and social media data.  Taking a hybrid data warehouse approach, the client was able to start small and move fast, uncovering new mobile service options.

Customer information generated from these newer data sources were blended together with existing customer data managed in the data warehouse to deliver newer insights.  IBM dashDB provided a high performing, public cloud data warehouse service that was up and running in minutes.  Automatic transformation of unstructured geospatial data into structured data, in-memory columnar processing, in-database geospatial analytics, integration with Tableau, and pricing were some of the key reasons IBM dashDB was chosen.

This brings me back to my first point – you don’t have to spend millions of dollars to capitalize on getting data in and getting data out.  For example, clients like the one described above took advantage of Cloudant JSON document store integration, enabling them to rapidly get data into IBM dashDB with ease– no ETL processing required.  Automatic schema discovery loads and replicates unstructured JSON documents that capture IoT, Web and mobile-based data into a structured format.  Getting data or information out was simple, as IBM dashDB provides in-database analytics and the use of familiar, integrated SQL based tools such as Cognos, Watson Analytics, Tableau, and Microstrategy.  I can only conclude that IBM dashDB is a great example of how a highly compatible cloud database can extend or modernize your on-premises data warehouse into a hybrid one to meet time-sensitive business initiatives.

What exactly is a hybrid data warehouse?  A hybrid data warehouse introduces technologies that extend the traditional data warehouse to provide key functionality required to meet new combinations of data, analytics and location, while addressing the following IT challenges:

  • Deliver new analytic services and data sets to meet time-sensitive business initiatives
  • Manage escalating costs due to massive growth in new data sources, analytic capabilities, and users
  • Achieve data warehouse elasticity and agility for ALL business data

mona_dashDB

Still not convinced on the power of a hybrid data warehouse?  Hear what Aberdeen Group’s expert Michael Lock has to say in this 30 min webcast.

About Mona,

mona_headshot

Mona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

IBM dashDB Local opens its preview for data warehousing on private clouds and more!

by Mitesh Shah

Just like in the story of Goldilocks … you may be looking for modern data warehousing that is “just right.”  Your IT strategy may include cloud and you may like the simplicity and scalability benefits of cloud … yet some data and applications may need to stay on-premises for a variety of reasons.  Traditional data warehouses provide essential analytics, yet they may not be right for new types of analytics, data born on the cloud, or simply cannot contain a growing workload of new requests.

IBM dashDB Local is an open preview technology that is designed to give you “just right” cloud-like simplicity and flexibility.  It delivers a configured data warehouse in a Docker container that you can deploy wherever you need it as long Docker is supported on that infrastructure. Often, this is a private cloud, virtual private cloud (AWS/Azure), or other software-defined infrastructure. You gain management simplicity and have an environment that you can control more directly.

DownloadFromDocker
Download and install dashDB Local quickly and simply via Docker container technology.

dashDB Local may be the right choice when you have complex applications that must be readied for cloud, have SLAs or regulations that require data or applications to stay on premises, or you need to address new analytics requests very quickly with easy scale in and out capabilities.

dashDB Local complements the dashDB data warehouse as a service offering that is delivered via IBM Bluemix. Because both products are based on a common database technology, you can move workloads across these editions without costly and complex application change!   This is one example of how we define a hybrid data warehouse and how it can help improve your flexibility over time as your needs evolve.

Since dashDB Local began its closed preview in February of 2016, the team has rallied to bring in a comprehensive set of data warehousing features to this edition of dashDB. We have been listening to the encouraging feedback from our initial preview participants, and as a result, we now have a solution that is open for you to test!

So what are you waiting for?

It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes.

Ours early adopters have been fascinated by the power and ease of deployment for the Docker container.  It’s become commonplace for us to hear feedback that participants can deploy a full MPP data warehouse offering with in-memory processing and columnar capabilities, on the infrastructure of our choice, within 15-20 minutes. One client said that dashDB Local is as easy to deploy and manage as a mobile app! We are thrilled by this type of feedback!

Workload monitoring in dashDB Local delivers elasticity to scale out or in.
Workload monitoring in dashDB Local delivers elasticity to scale out or in.

The open preview (v. 0.5.0) offers extreme scale out and scale in capabilities. Yes, you heard me right. Scale-in provides the elasticity to not tie up your valuable resources beyond the peak workloads. This maximizes return on investment for your variable reporting and analytics solutions.  The open preview will help you test drive the Netezza compatibility (IBM PureData System for Analytics) within dashDB technology, as well as analytics support using RStudio. Automated High Availability is another attractive feature that is provided out-of-the box for you to see and test.

Preview participants have been eager to test drive query performance. One participant says, “We are very impressed with the performance, and within no time we have grown our dataset of 40 million to 200 million records (a few TBs) and the analytics test queries run effortless.” Our participants are leveraging their data center infrastructure whether it’s bare metal or virtualized (VMs) to get started and some have installed it on their laptops to quickly gain an understanding of this preview.

Register for dashDB Local previewFind out how it can be “just right” for you!  Go here to give it a try and get ready to be wowed.  We value and need your feedback to help us prioritize features that are important to your business.  All the best and don’t hesitate to drop me a line to let me know what you think!


About Mitesh,

MiteshMitesh Shah is the product manager for the new dashDB Local data warehousing solution as a software-defined environment (SDE) that can be used on private clouds and platforms that support Docker container technology. He has broad experience around various facets of software development revolving around relational databases and data warehousing technologies.  Throughout his career, Mitesh has enjoyed a focus on helping clients address their data management and solution architecture needs.

IBM Fluid Query 1.7 is Here!

by Doug Dailey

IBM Fluid Query offers a wide range of capabilities to help your business adapt to a hybrid data architecture and more importantly it helps you bridge across “data silos” for deeper insights that leverage more data.   Fluid Query is a standard entitlement included with the Netezza Platform Software suite for PureData for Analytics (formerly Netezza). Fluid Query release 1.7 is now available, and you can learn more about its features below.

Why should you consider Fluid Query?

It offers many possible uses for solving business problems in your business. Here are a few ideas:
• Discover and explore “Day Zero” data landing in your Hadoop environment
• Query data from multiple cross-enterprise repositories to understand relationships
• Access structured data from common sources like Oracle, SQL Server, MySQL, and PostgreSQL
• Query historical data on Hadoop via Hive, BigInsights Big SQL or Impala
• Derive relationships between data residing on Hadoop, the cloud and on-premises
• Offload colder data from PureData System for Analytics to Hadoop to free capacity
• Drive business continuity through low fidelity disaster recovery solution on Hadoop
• Backup your database or a subset of data to Hadoop in an immutable format
• Incrementally feed analytics side-cars residing on Hadoop with dimensional data

By far, the most prominent use for Fluid Query for a data warehouse administrator is that of warehouse augmentation, capacity relief and replicating analytics side-cars for analysts and scientists.

New: Hadoop connector support for Hadoop file formats to increase flexibility

IBM Fluid Query 1.7 ushers in greater flexibility for Hadoop users with support for popular file formats typically used with HDFS.Fluid query 1.7 connector picture These include popular data storage formats like AVRO, Parquet, ORC and RC that are often used to manage bigdata in a Hadoop environment.

Choosing the best format and compression mode can result in drastic differences in performance and storage on disk. A file format that doesn’t support flexible schema evolution can result in a processing penalty when making simple changes to a table. Let’s just  say that if you live in the Hadoop domain, you know exactly what I am speaking of. For instance, if you want to use AVRO, do your tools have readers and writers that are compatible? If you are using IMPALA, do you know that it doesn’t support ORC, or that Hortonworks and Hive-Stinger don’t play well with Parquet? Double check your needs and tool sets before diving into these popular format types.

By providing support for these popular formats,  Fluid Query allows you to import, store, and access this data through local tools and utilities on HDFS. But here is where it gets interesting in Fluid Query 1.7: you can also query data in these formats through the Hadoop connector provided with IBM Fluid Query, without any change to your SQL!

New: Robust connector templates

In addition, Fluid Query 1.7 now makes available a more robust set of connector templates that are designed to help you jump start use of Fluid Query. You may recall we provided support for a generic connector in our prior release that allows you to configure and connect to any structured data store via JDBC. We are offering pre-defined templates with the 1.7 release so you can get up and running more quickly. In cases where there are differences in user data type mapping, we also provide mapping files to simplify access.  If you have your own favorite database, you can use our generic connector, along with any of the provided templates as a basis for building a new connector for your specific needs. There are templates for Oracle, Teradata, SQL Server, MySQL, PostgreSQL, Informix, and MapR for Hive.

Again, the primary focus for Fluid Query is to deliver open data access across your ecosystem. Whether the data resides on disk, in-memory, in the Cloud or on Hadoop, we strive to enable your business to be open for data. We recognize that you are up against significant challenges in meeting demands of the business and marketplace, with one of the top priorities around access and federation.

New: Data movement advances

Moving data is not the best choice. Businesses spend quite a bit of effort ingesting data, staging the data, scrubbing, prepping and scoring the data for consumption for business users. This is costly process. As we move closer and closer to virtualization, the goal is to move the smallest amount of data possible, while you access and query only the data you need. So not only is access paramount, but your knowledge of the data in your environment is crucial to efficiently using it.

Fluid Query does offer data movement capability through what we call Fast Data Movement. Focusing on the pipe between PDA and Hadoop, we offer a high speed transfer tool that allows you to transfer data between these two environments very efficiently and securely. You have control over the security, compression, format and where clause (DB, table, filtered data). A key benefit is our ability to transfer data in our proprietary binary format. This enables orders of magnitude performance over Sqoop, when you do have to move data.

Fluid Query 1.7 also offers some additional benefits:
• Kerberos support for our generic database connector
• Support for BigInsights Big SQL during import (automatically synchronizes Hive and Big SQL on import)
• Varchar and String mapping improvements
• Import of nz.fq.table parameter now supports a combination of multiple schemas and tables
• Improved date handling
• Improved validation for NPS and Hadoop environment (connectors and import/export)
• Support for BigInsights 4.1 and Cloudera 5.5.1
• A new Best Practices User Guide, plus two new Tutorials

You can download this from IBM’s Fix Central or the Netezza Developer’s Network for use with the Netezza Emulator through our non-warranted software.

Picture1

Take a test drive today!

About Doug,
Doug Daily
Doug has over 20 years combined technical & management experience in the software industry with emphasis in customer service and more recently product management.He is currently part of a highly motivated product management team that is both inspired by and passionate about the IBM PureData System for Analytics product portfolio.

Using Docker containers for software-defined environments or private cloud implementations

by Mitesh Shah

Data warehousing architectures have evolved considerably over recent years. As businesses try to derive insight as the basis of value creation, ALL roles must participate by leveraging new insights.  As a result, analytics needs are expanding, markets are transforming and new business models are being created.  This ushers in increased requirements for self-service analytics and alternative infrastructure solutions. Read on to learn how the “software-defined environment” (SDE) that utilizes container technology can help you meet expanded analytics needs.

Adaptability delivered through software-defined environments

From an avalanche of new data, to mobile computing and cloud-based platforms, new technologies must move into the IT infrastructure very quickly. Traditional IT systems—hampered by labor-intensive management and high costs—are struggling to keep up. IT organizations are caught between complex security requirements, extreme data volumes and the need for rapid deployment of new services. A simpler, more adaptive and more responsive IT infrastructure is required.

One of the key solutions on the horizon is  the SDE which optimizes the entire computing infrastructure – compute, storage and network resources – so that IT staff can adapt to different types of workloads very quickly. For example, without an SDE, resources are assigned manually to workloads; the same assignments happens automatically within an SDE.

Now, dashDB Local  (via Docker container) is available as an early access client preview.  I hope you will test this new technology and provide us valuable feedback. Learn more, then request access: ibm.biz/dashDBLocal

By dynamically assigning workloads to IT resources based on a variety of factors, including the characteristics of specific applications, the best-available resources, and service-level policies, a software-defined environment can deliver continuous, dynamic optimization and reconfiguration to address infrastructure issues.

Software-defined environment benefits

A software defined environment framework can help to:

  • Simplify operations with automated infrastructure tuning and configuration
  • Reduce time to value with a simple, pluggable and rich API-supported architectures
  • Sense and respond to workload demands automatically
  • Optimize resources by assigning assets without manual intervention
  • Maintain security and manage privacy through a common platform
  • Facilitate better business outcomes through advanced analytics and cognitive capabilities

A software-defined environment fits well into the private cloud ecosystem so that IT staff can deliver flexibility and ease of consumption, as well as maximize the use of commodity or virtualized hardware. An SDE is now easily achievable by leveraging container technology, where Docker is one of the leaders.

Docker containers provide application portability

Docker containers “wrap up” a piece of software in a complete file system that contains everything the software needs to run: code, run-times, system tools, system libraries and other components that can be installed on a server. This guarantees that the software will always run the same, regardless of the environment in which it is running.

Docker provides true application portability and ease of consumption by alleviating the complex process of software setup and installation that often can require multiple skills across multiple hours or days. It provides OS-level abstraction without disrupting the standards on the host operating system, which makes it even more attractive.

One key point to keep in mind is that Docker is not the same as VMware. Docker provides process isolation at the operating system level, whereas VMware provides a hardware abstraction layer. Unlike VMware, Docker does not create an entire virtual operating system. Instead, the host operating system kernel can be shared across multiple Docker containers. This makes it very lightweight to deploy and faster to start than a virtual machine.  There is no looking back, as container technology is being very quickly embraced as part of a hybrid solution that meets business user needs-fast!

dashDB Local: data warehousing delivered via Docker container

Coming full circle, the data warehouse is the foundation of all analytics and must be fast and agile to serve new analytics needs.  Software defined environments make this easy to do – enabling key deployment of the warehousing engine in minutes as compared to hours or days.

IBM dashDB is the data warehousing technology that delivers high speed insights through in-memory computing and  in-database analytics at massively parallel processing (MPP) scale.  It has been available as a fully managed services on the IBM cloud.  Now, dashDB Local  as a is available as an early access client preview for private clouds and other software-defined infrastructures.  I hope you will test this new technology and provide us valuable feedback. Learn more, then request access: ibm.biz/dashDBLocal

About Mitesh,

MiteshMitesh Shah is the product manager for the new dashDB data warehousing solution as a software-defined environment (SDE) that can be used on private clouds and other implementations that support Docker container technology. He has broad experience around various facets of software development revolving around database and data warehousing technologies.  Throughout his career, Mitesh has enjoyed a focus on helping clients address their data management and solution architecture needs.

Making faster decisions at the point of engagement with IBM PureData System for Operational Analytics

by Rahul Agarwal

The need for operational analytics
Today, businesses across the world face challenges dealing with the increasing cost and complexity of IT, as they cope with the growing volume, velocity and diversity of information. However, organizations realize that they must capitalize on this information through the smart use of analytics to meet emerging challenges and uncover new business opportunities.

… analytics needs to change from a predominantly back-office activity for a handful of experts to something that can provide pervasive, predictive, near-real-time information for front-line decision makers.

One thing that is increasingly becoming clear is that analytics is most valuable when it empowers individuals throughout the organization. Therefore, analytics needs to change from a pre-dominantly back-office activity for a handful of experts to something that can provide pervasive, predictive, near-real-time information for front-line decision makers.

Low latency analytics on transactional data, or operational analytics, provide actionable insight at point of engagement, giving organizations the opportunity to deliver impactful and engaging services faster than their competition. So what should one look for in an operational analytics system?

Technical capabilities
A high percentage of queries to ‘operational analytics’ systems—often up to 80% — are interactive lookups that are focused on data about a specific customer, account or patient. To deliver the correct information as rapidly as possible, systems must be optimized for the right balance of analytics performance and operational query throughput.

… systems must be optimized for the right balance of analytics performance and operational query throughput.

IT requirements
In order to maximize the benefits of operational analytics, one needs a solution that will quickly deliver value, better performance, scale and efficiency – while reducing the need for IT experts who design, integrate and maintain IT systems. In addition, one should look for a system, which comes with deep levels of optimization to achieve the desired scale, performance, and service quality, since assembling the right skills to optimize these systems is a costly and often difficult endeavour.

Flexibility
The ideal system should provide analytic capabilities to deliver rapid and compelling return on investment now; and this system must grow to meet new demands so that it remains as relevant and powerful in the future as it is today. In addition, the system should have the flexibility to meet these demands without disrupting the free-flow of decision support intelligence to the individuals and applications driving the business.

IBM PureData System for Operational Analytics
The IBM PureData System for Operational Analytics helps organizations meet these complex requirements with an expert integrated data system that is designed and optimized specifically for the demands of an operational analytics workload.
Built on IBM POWER Systems servers with IBM System Storage and powered by IBM DB2 software, the system is a complete solution for operational analytics that provides both the simplicity of an appliance and the flexibility of a custom solution. The system has recently been refreshed with latest technology that will help customers to make faster, fact-based decisions ¬and now offers:

  • Accelerated performance with the help of new, more powerful servers that leverage POWER8 technology and improved tiered storage which uses spinning disks for ‘cool’ data and IBM FlashSystemTM storage for the ‘hot’ or frequently accessed data.
  • Enhanced scalability that allows the system to grow to peta-scale capacity. In addition, nodes of the refreshed system can be added to previous generation of PureData System for Operational Analytics thus providing better protection for your technology investment.
  • A reduced data center footprint as a result of increased hardware density.

So explore the benefits and use cases of PureData System for Operational Analytics by visiting our website, ibm.com/software/data/puredata/operationalanalytics as well as connecting with IBM experts.

About Rahul Agarwal

Rahul AgarwalRahul Agarwal is a member of the worldwide product marketing team at IBM that focuses on data warehouse and database technology. Rahul has held a variety of business management, product marketing, and other roles in other companies including HCL Technologies and HP before joining IBM.  Rahul studied at the Indian Institute of Management, Kozhikode and holds a bachelor of engineering (electronics) degree from the University of Pune, India. Rahul’s Twitter handle :  @rahulag80


 

Is the Data Warehouse Dead? Is Hadoop trying to kill it?

By Dennis Duckworth

I attended the Strata + Hadoop World Conference in San Jose a few weeks ago, which I enjoyed immensely. I found that this conference had a slightly different “feel” than previous Hadoop conferences in terms of how Hadoop was being positioned. Since I am from the data warehouse world, I have been sensitive to Hadoop being promoted as a replacement for the data warehouse.

In previous conferences, sponsors and presenters seemed almost giddy in their prognostication that Hadoop would become the main data storage and analytics platform in the enterprise, taking more and more load from the data warehouse and eventually replacing it completely. This year, there didn’t seem to be much negative talk about data warehouses. Cloudera, for example, clearly showed its Hadoop-based “Enterprise Data Hub” as being complementary to the Enterprise Data Warehouse rather than as a replacement, reiterating the clarification of their positioning and strategy that they made last year. Maybe this was an indication that the Hadoop market was maturing even more, with companies having more Hadoop projects in production and, thus, having more real experience with what Hadoop did well and, as importantly, what it didn’t do well. Perhaps, too, the data warehouse escaped being the villain (or victim) because the “us against them” camp was distracted by the emergence and perceived threat of some other technologies like Spark and Mesos.

The conference was just another data point supporting my hypothesis that Hadoop and other Big Data technologies are complementing existing data warehouses in enterprises rather than replacing them. Another data point (actually a collection of many data points) can be seen in the survey results of The Information Difference Company as reported in the paper “Is the Data Warehouse Dead?”, sponsored by IBM. You can download a copy here.

Reading through this report, I found myself recalling many of the conversations I myself have had with customers and prospects over the last few years. If you have read some of my previous blogs, you will know that IBM is a big believer in the power of Big Data. We have solutions that help enterprises deal with the new challenges they are facing with the increasing size, speed and diversity of data. But we continue to offer and recommend relational database and data warehouse solutions because they are essential for deriving business value from data – they have done that in the past, they continue to do so today.

We believe that they will continue doing so going forward. Structured data doesn’t go away, nor does the need for doing analytics (descriptive, predictive, or prescriptive) on the data. An analytics engine that was created and tuned for structured data will continue to be the best place to do such analytics. Sure, you can do some really neat data exploration and visualizations on all sorts of data in Hadoop, but you still need your daily/weekly/monthly reports and your executive dashboards, all needing to be produced within shrinking time windows, that are all fueled by structured data.

About Dennis Duckworth

Dennis Duckworth, Program Director of Product Marketing for Data Management & Data Warehousing has been in the data game for quite a while, doing everything from Lisp programming in artificial intelligence to managing a sales territory for a database company. He has a passion for helping companies and people get real value out of cool technology. Dennis came to IBM through its acquisition of Netezza, where he was Director of Competitive and Market Intelligence. He holds a degree in Electrical Engineering from Stanford University but has spent most of his life on the East Coast. When not working, Dennis enjoys sailing off his backyard on Buzzards Bay and he is relentless in his pursuit of wine enlightenment.

See also: New Fluid Query for PureData and Hadoop by Wendy Lucas