Turn Up The Power For Software-Defined Data Warehousing

by Mona Patel

Interview with Mukta Singh

As big data analytics technologies such as Spark and Hadoop continue their move into the mainstream, you might think that the traditional data warehouse is becoming less important.

Actually, nothing could be further from the truth.

To enable data of all types to be ingested, transformed, processed and analyzed efficiently, many companies are choosing to build hybrid analytics architectures that plug cloud and open source technologies such as Spark and Hadoop into on-premises environments. At the heart of these hybrid architectures lies the data warehouse – a highly reliable resource that provides a single source of truth for enterprise reporting and analytics.

This raises an important question: since the data warehouse is so central to the hybrid analytics architecture, how can we make sure it performs well and cost-effectively?

Traditional wisdom is that the infrastructure doesn’t matter – that running these vital systems of record on commodity hardware is perfectly adequate. But when you look at the numbers, you may begin to question that view.

To understand why the right hardware – in this case, IBM Power Systems – can make a real difference, I spoke with Mukta Singh, Director of Data Warehousing at IBM. In my conversation with Mukta, we take a deeper dive into why IBM’s software-defined data warehouse – IBM dashDB Local – on IBM Power Systems offers a better price/performance ratio compared to commodity hardware.

Mona_Blog

Mona Patel: Can you tell our readers a little bit about the Power Architecture? What is so unique about it?

Mukta Singh: IBM Power Systems is the dominant server platform in today’s Unix market, with over 50 percent market share. It has also become a leading platform for Linux systems, and we have seen tremendous growth in that area in recent years.

Unlike commodity servers, which typically use x86 processors, Power servers use IBM’s Power Architecture, a unique processor architecture that has been designed specifically for big data and analytics workloads.

Mona Patel: How does IBM dashDB Local integrate with Power Systems?

Mukta Singh: dashDB Local is a software-defined data warehouse offering that has been optimized for rapid deployment and ease of management. Essentially, the system runs in a Docker container, which means it can be flexibly deployed on different types of hardware either on-premises or in a private or public cloud environment.

One of the options today is to deploy your dashDB Local container on IBM Power Systems – it runs completely transparently, and it’s optimized to allow the dashDB engine to take advantage of the unique features of the Power Architecture.

If you want to move an existing dashDB Local environment from x86 to Power Systems, that’s easy too. The latest-generation POWER8 processors can operate in little-endian (LE) mode, which is the same byte order that x86 processors use. That means that you can move a dashDB container from one platform to the other without making any changes to your applications or data.

At a higher level, we have also ensured that running dashDB on Power Systems offers the same user experience as it does on x86, so the database and OS management, monitoring and integration aspects are exactly the same. The skills are completely transferable from one platform to another, so it’s a free choice and users don’t have to worry about being locked in.

Mona Patel: Can you tell us about the benefits that the Power Architecture provides for dashDB Local?

Mukta Singh: Well, for example, dashDB’s analytics engine is built on IBM BLU Acceleration – a columnar, in-memory technology that cuts query run-times from hours or minutes to just seconds.

BLU Acceleration is designed to take advantage of multi-threaded cores, and Power processors have more threads per core than most current x86 processors. In fact, if you compare an IBM POWER8 processor to an Intel Broadwell EX, it has four times as many threads per core. That means if you have a query that BLU can parallelize, you will get much better performance from Power Systems.

Similarly, because dashDB’s BLU Acceleration does all the processing in-memory, the bandwidth between the processor and the memory is very important. Again, Power Systems has a huge advantage here, with four times as much memory bandwidth as the x86 equivalent.

Finally, the processor’s cache size is important. BLU is engineered to do the majority of its processing in the CPU cache. That means it doesn’t need to repeatedly access data from RAM, which is usually a much slower process. Power processors offer four times as much cache than x86, which means they offer lower latency and reduce the need to access RAM even further. So they play to the strengths of dashDB’s query engine.

Mona Patel: So how do those numbers translate in terms of performance and cost-efficiency?

Mukta Singh: We’ve done a benchmark with dashDB Local of a 24-core POWER8 server versus a 44-core x86 server.

The Power server was 1.2 times faster in terms of throughput, despite having 45 percent fewer cores. Or to look at it another way, each POWER8 core offered 2.2 times more throughput than the x86 equivalent. Leadership performance and competitive pricing for Power scale-out servers deliver a very compelling price-performance-optimized solution with dashDB Local.

Mona Patel: How do you see the market for dashDB Local on Power Systems? Is this something that customers have been asking for?

Mukta Singh: Even when we started bringing dashDB Local to market last year, there were Power clients who were interested. As I mentioned earlier, Power has a dominant share of the Unix market, and there are thousands of companies whose businesses are built on DB2 or Oracle databases running on Power Systems. For companies that rely on Power Systems already, the idea of running dashDB Local on their existing infrastructure is very attractive.

But the results of our benchmark suggests that this isn’t just a good idea for existing Power clients – it’s also an opportunity for new clients to start out running dashDB on a hardware platform that is tailor-made for high-performance analytics.

And for any client who currently runs dashDB on x86 servers, the message we’d like to get across is that it’s easy to move to Power Systems. It’s faster, it’s more cost-effective, and you still get all the ease of use and ease of management that you’re used to with your existing dashDB environment.

Mona Patel: OK, last question: where can our readers go to learn more about dashDB Local on Power? Can they try out dashDB Local on Power Systems before they buy?

Mukta Singh: Yes, we offer a free trial with a Docker ID – please visit dashDB.com to learn more and access the trial.

About Mona,

mona_headshotMona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

Start Small and Move Fast: The Hybrid Data Warehouse

by Mona Patel

In the world of cutting edge big data analytics, the same obstacles in gaining meaningful insight still exists – ease of getting data in and getting data out.  To address these long standing issues, the utmost flexibility is needed, especially when layered with the agile needs of the business.

Why spend millions of dollars replacing your data and analytics environment with the latest technology promise to address these issues, when can you to leverage existing investments, resources, and skills to achieve the same, and sometimes better, insight?

Consider a hybrid data warehouse.  This approach allows you to start small and move fast. It provides the best of both worlds – flexibility and agility without breaking the bank.  You can RAPIDLY serve up quality data managed by your data warehouse, blended with newer data sources and data types in the cloud, and apply integrated analytics such as Spark or R – all without additional IT resources and expertise.  How is this possible?  IBM dashDB.

Read Aberdeen’s latest report on The Hybrid Data Warehouse.

mona's blog

 

Watch Aberdeen Group’s Webcast on The Hybrid Data Warehouse.

Let me give you an example.  We live in a digital world, with organizations now very interested in improving customer data capture across mobile, web, IoT, social media, and more for newer insights.  A telecommunications client was facing heavy competition and wanted to quickly deliver unique mobile services for an upcoming event in order to acquire new customers by collecting and analyzing mobile and social media data.  Taking a hybrid data warehouse approach, the client was able to start small and move fast, uncovering new mobile service options.

Customer information generated from these newer data sources were blended together with existing customer data managed in the data warehouse to deliver newer insights.  IBM dashDB provided a high performing, public cloud data warehouse service that was up and running in minutes.  Automatic transformation of unstructured geospatial data into structured data, in-memory columnar processing, in-database geospatial analytics, integration with Tableau, and pricing were some of the key reasons IBM dashDB was chosen.

This brings me back to my first point – you don’t have to spend millions of dollars to capitalize on getting data in and getting data out.  For example, clients like the one described above took advantage of Cloudant JSON document store integration, enabling them to rapidly get data into IBM dashDB with ease– no ETL processing required.  Automatic schema discovery loads and replicates unstructured JSON documents that capture IoT, Web and mobile-based data into a structured format.  Getting data or information out was simple, as IBM dashDB provides in-database analytics and the use of familiar, integrated SQL based tools such as Cognos, Watson Analytics, Tableau, and Microstrategy.  I can only conclude that IBM dashDB is a great example of how a highly compatible cloud database can extend or modernize your on-premises data warehouse into a hybrid one to meet time-sensitive business initiatives.

What exactly is a hybrid data warehouse?  A hybrid data warehouse introduces technologies that extend the traditional data warehouse to provide key functionality required to meet new combinations of data, analytics and location, while addressing the following IT challenges:

  • Deliver new analytic services and data sets to meet time-sensitive business initiatives
  • Manage escalating costs due to massive growth in new data sources, analytic capabilities, and users
  • Achieve data warehouse elasticity and agility for ALL business data

mona_dashDB

Still not convinced on the power of a hybrid data warehouse?  Hear what Aberdeen Group’s expert Michael Lock has to say in this 30 min webcast.

About Mona,

mona_headshot

Mona Patel is currently the Portfolio Marketing Manager for IBM dashDB, the future of data warehousing.  With over 20 years of analyzing data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at IBM, a leader in data warehousing and analytics.  Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.

How dashDB Helps Media Channels Boost Revenues And Viewership

By Harsimran Singh Labana

Did you ever wonder how a media channel decides which ad comes at what time? Well, there is an analytics science behind this.

Cable and broadcast networks pay studios large sums of money for the right to broadcast a specific show or movie at specific times on specific channels. To achieve a return on that investment, networks must design TV schedules and promotional campaigns to maximize viewership and boost advertising revenues.

RSG Media is an IBM dashDB managed service client that partners with cable and broadcast, entertainment, games and publishing firms to provide insights that help maximize revenue from content, advertising and marketing inventories. Shiv Sehgal, Solutions Architect, RSG Media says, “We had the rights data, the scheduling data and the advertising revenues data. If we could combine this with viewership and social media data, we could give our clients a true 360-degree view of their operations and profitability, down to the level of individual broadcasts. The missing piece of the puzzle was to build a data and analytics capability that could bring all the data together and turn it into business insight – and that’s where IBM came in.”

RSG Media chose IBM because of its complete vision for cloud analytics. This includes an integrated set of solutions for building advanced analytics applications and coordinating them with all the relevant data services in the cloud.

RSG Media’s Big Knowledge Platform is built on the IBM® Cloudant® NoSQL document store and the IBM dashDB™ data warehouse service, orchestrated through the IBM Bluemix® cloud application development platform. Cloudant’s Schema Discovery Process (SDP) is used to ingest and translate semi-structured data from more than 50 sources, and structure that data into a schema that the dashDB relational data warehouse understands.

RSG Media is not stopping here and they are excited about Watson Analytics and how it predicts customer behavior.  Learn more about RSG Media success using dashDB and Cloudant solutions on Bluemix.

About Harsimran,
HarryHarsimran Singh Labana is the Portfolio Marketing Manager for IBM’s Data Warehousing team. Working in a worldwide role he ensures marketing support for IBM’s solutions. He has been with IBM for close to five years working in diverse roles like sales and social media marketing. He stays in Bangalore, India with his wife and son.

How an appliance the size of a pizza box can be your new big data nucleus?

Are you on the verge of starting your first big data project? Are you still unsure which technology you should use because of required skill sets? Do you have only a limited budget but need to address the most common big data challenges at once? If you answer these 3 questions with a “YES”, then this blog could be an eye opener for you.

Big data is a challenge for every industry – no matter how big or how small a company may be. The challenges are always very similar: Volume, Variety, Velocity and Veracity. These are 4 indicators for big data requirements. However, most of the time only a subset of these requirements may apply – at least at the beginning of the “big data journey”. Personally, I would add another “V”, which is often not so obvious from the beginning: Value. Value in terms of: what are the expected costs related to big data projects and what is the most probable outcome? Nobody will invest huge amounts of money in new hardware and software if the outcome is very unpredictable.

That’s why most companies start with a “sandbox” big data project: experimenting with trial and open source software on virtual machines and existing hardware in order to keep the initial investment small. But sooner or later, important decisions need to be made: will this be next generation architecture for big data and analytics? How much will it cost to move from a sandbox to a mature production environment? What about enterprise support for the new big data platform?

data warehouse, data warehousing, PureData, data warehouse appliance
PureData for Analytics N3001-001

IBM has acknowledged these challenges and the requirement for an entry-level big data platform. Have you heard of the new Ultra LitePureData N3001-001? Introduced at the end of 2014, this big data appliance is an optimized, powerful combination of hardware and software that is the size of a family pizza-box. It is able to process and store up to 16 Terabytes of structured data and can serve as the center and hub for other required big data products–thus covering the 4 or 5 “V’s” of big data.

data warehousing, data warehouse appliance

The IBM PureData System for Analytics N3001-001 is a factory configured, highly available big data appliance for the processing and storage of structured data at rest.  It is architected as a shared-nothing Massive Parallel Processing (MPP) architecture consisting of:

  • A server
  • A database
  • Storage on standard, cost efficient SATA self encrypting drives (SED)
  • Networking fabric (10 GBit)
  • Analytic software (Netezza technology)

PureData for Analytics comes with production licenses for a suite of other IBM big data products and integrates with these products through well-defined standard industry interfaces (SQL, ODBC, JDBC, OLE DB) for maximum data throughput and reliability. So you get a factory configured, highly available processing MPP platform for todays Big Data analytic requirements.

But not even PureData for Analytics can deal with all “V”s mentioned above. Big data analytics is a team game and that’s the reason why it comes with production licenses for these additional IBM big data products:.

  • IBM InfoSphere BigInsights: PureData refines the raw and unstructured data from IBM InfoSphere BigInsights with its ability to process huge an amount of data with its patented and industry leading Netezza technology. PureData reads and writes data to and from Hadoop using state-of-the art integration technology as well as running MapReduce™ programs within its database.
  • IBM InfoSphere Information Server: Information Server pushes transformations down to PureData using it’s MPP architecture so that transformations are processed in-database rather on a separate server platform. This helps to reduce network traffic and data movement as well as to reduce the cost of a more powerful server platform for Information Server. Information Server can use PureData analytic and transformational functions and utilize its shared-nothing architecture to process terabytes of structured data per hour.
  • IBM COGNOS: COGNOS is the Business Intelligence platform that is optimized to work with PureData. It, supports in-database analytics, pushdown SQL, OLAP over relational and many more features, utilizing the shared-nothing MPP architecture of PureData. COGNOS adds in-memory features to the disk-based PureData architecture, making it able to analyze huge amounts of data.
  • IBM InfoSphere Streams: PureData integrates well with Streams and can be a data source, as well as a data sink (target) for Streams. Since Streams is able to process and analyze huge amounts of data / events per second (millions of data packages per second), Streams needs a resourceful target to offload the analyzed data – able to store the terabytes of data required for further, deeper analytics. This is a non-production single license for the Streams product.

Not included but highly recommended

 With this big data nucleus you can start your journey with more confidence – with the right basis to grow and scale from the beginning. For an optimal user experience I recommend the following optional products to maximize the results:

  • IBM SPSS: PureData is able to act as a powerful scoring platform for IBM SPSS, supporting data mining and predictive use-cases with built-in analytics functions and its massive parallel processing power. With PureData, SPSS does not need an extra scoring server and can even run programs written in R, C, C++, Fortran, Java, Python and NZ-LUA in the core database.
  • Watson Explorer: PureData is a supported metadata crawler source for Watson Explorer.  It supplies a big data inventory for all structured data stored within the PureData 16 Terabyte capacity.

 Conclusion

IBM has made it possible to start the big data journey with small investments, using highly mature, industry leading software and an analytic big data appliance as its core. This helps you make a smooth transition from sandbox to production without disruption. Why not give it a try?

Connect with me on Twitter (@striple66) and meet me during CeBIT 2015 in Hanover, Germany.

 

About Ralf Goetz 
Ralf is an Expert Level Certified IT Specialist in the IBM Software Group. Ralf joined IBM trough the Netezza acquisition in early 2011. For several years, he led the Informatica tech-sales team in DACH region and the Mahindra Satyam BI competency team in Germany. He then became part of the technical pre-sales representative for Netezza and later for the PureData System for Analytics. Ralf is still focusing on PDA but is also supporting the technical sales of all IBM BigData products. Ralf holds a Master degree in computer science.

IBM’s Point of View on Data Warehouse Modernization  

By Louis T. Cherian,

The world of Data Warehousing continues to evolve, with an unimaginable amount of data being produced each moment and advancement of technologies that allow us to consume this data.  This provides new capabilities for organizations to make better informed business decisions, faster.
To take advantage of this opportunity in today’s era of Big Data and the Internet of things, our customers really need to have a solid Data Warehouse modernization strategy. Organizations should look to optimize with new technology and capabilities like:

  • in-memory databases,  to speed analytics,
  • Hadoop to analyze unstructured data to enhance existing analytics,
  • Data warehouse appliances with improved capabilities and performance

To understand more about the importance of Data Warehouse Modernization and to get answers to questions like:

  • What is changing in the world of Data Warehousing?
  • Why should customers act now and what should they do?
  • What is the need for companies to modernize their Data Warehouse?
  • How are IBM Data Warehousing Solutions able to address the need of Data Warehouse Modernization?

Watch this video by the IBM Data Warehousing team to know more about the breadth and depth of IBM Data Warehouse solutions. For more information, you can visit our website .

About Louis T. Cherian,

Louis T. Cherian is currently a member of the worldwide product marketing team at IBM that focuses on data warehouse and database technology. Prior to this role, Louis has held a variety of product marketing roles within IBM, and in Tata Consultancy Services, prior to joining IBM.  Louis has done his PGDBM from Xavier Institute of Management and Entrepreneurship, and also has an engineering degree in computer science from VTU Bangalore.

 

dashDB is Here, and This Changes Everything for Data Warehousing on the Cloud

By Nancy Hensley

data warehosuing, analytics, dashDB, cloud, DaaS

I remember the days of actually selling the idea of a data warehouse to organizations, trying to convince them to leverage data to provide some insight to their business. Back then, it was a nice-to-have and we would celebrate every new customer in the “terabyte club”.

Now, I can’t help but laugh at that, because the little backup device on my desk is a terabyte. The fact is, things have changed. We have generated more data in the last few years than ever before and this data can be gold for our businesses. Gone are the days of convincing organizations that leveraging data is important, today it’s critical.

Data warehousing has seen a lot of disruption over the last decade but we did not get to the intersection of knowledge and opportunity fast enough. The architecture got too complex and we spent all our time managing performance and our businesses were losing the race.

Yes it’s a race. We have to spot trends faster, capitalize on opportunities before the competition, grow faster, optimize more, offer more services, be easy to do business with and be the best in class. To get there, you need analytics and you need them faster than ever before. Sometimes you just need the data warehouse infrastructure out of your way.

Something had to change to support the new business climate.

The good news is that something has changed. IBM has announced our latest disruption to change data warehousing for the better ….

Meet dashDB
dashDB is a data warehousing and analytics as a service offering on the cloud. dashDB offers robust analytics at incredibly fast processing speeds all in a cloud-easy format that lets you load and analyze data extremely quickly.

Yup it’s THAT EASY to win the race.

At the intersection of cloud technologies and analytics, dashDB represents the sweet spot for both IT and line of business professionals looking for a competitive edge in the data:

  • For IT professionals, dashDB helps quickly deliver solutions that the business needs without having to spend time with managing the infrastructure to serve a new request. dashDB works as part of the data warehousing strategy— no matter your starting point, from extending your on-premises warehouse to starting something completely new.
  • For line of business professionals, dashDB offers something you really need…self-service. That’s right, be the master of your own data kingdom. dashDB lets you load your data and get started with analytics in a couple of hours, taking the infrastructure out of your way. Yup, that’s right, out of your way. Imagine the possibilities! Got an idea? Go get it. Want a deeper understanding of your customers? No problem! Want to better predict challenges before they happen? We give you the keys to your crystal ball with best in class predictive capabilities.
  • For data science professionals, you can load data and work with queries, models and other analytics techniques without the hassle of CAPEX and dependencies on staff. dashDB includes R support, comes with an ecosystem of partners who provide specialized analytics capabilities and is Watson Analytics ready.

In short, it is pretty cool. So what is the technology behind the scenes that makes these fast, easy analytics possible? dashDB combines robust in-database analytic processing with in-memory technology that delivers a performance boost and the enterprise-class SoftLayer cloud infrastructure. dashDB is available on the Bluemix platform to work with a wide variety of data types, and there is a specialized version that works with Cloudant JSON data stores.

What are you waiting for? Get growing with dashDB at www.dashDB.com. You can start using dashDB as a freemium. Experience the future of data analytics. Deep analytics, fast processing and cloud-easy.

Advanced Security in an Insecure Data World

By Rich Hughes,

“Customer data may be at risk” is an all too familiar corporate acknowledgement these days, a communication event no enterprise wants to face.  Target lost personal information from at least 40,000,000 customers, stolen by thieves in late 2013.  This was followed and exceeded by Home Depot’s announcement last month that 56,000,000 customer bank cards used at the retailer’s 1,900 stores had been compromised.  Yes, Virginia, even your recent ice cream treat transaction at Dairy Queen has found its way into hacker’s hands. Most importantly, data breaches like these disrupt the trusted bond between a retailer and their customer, and as a consequence, top and bottom line numbers are negatively impacted.

Addressing security concerns for data warehouses, the IBM® PureData™ System for Analytics N3001 was announced for General Availability on October 17, 2014.   The N3001 appliance family brings advanced security to your data in this insecure world. Building on the appliance simplicity model, all data is stored on self encrypting  disk (SED) drives, providing security while not impacting performance. The protection provided by the SED implementation supports the industries requiring the strictest security compliance — health care, government, and the financial sectors.  This system utilizes strong authentication preventing threats due to unauthorized access, based on industry standard Kerberos protocol.

The N3001 Self Encrypting Drive protects your data-at-rest.  Both temporary data and user data tables are encrypted, and then this security level is bolstered by a key management scheme.

How does this work?  The SED disk drives are unlocked when the IBM® PureData™ System for Analytics N3001 ships to your data center.  And while the SED disk encryption is the first security level, an Advanced Encryption Standard (AES) compliant,  256 bit key needs to be created to cover all N3001 disks—both on the host and at the Snippet Processing Unit compartments.  This second security tier, the AES 256 bit key, can be initialized at any point after your data is loaded into the appliance.

The key management utility allows flexibility to update and rotate keys depending on the frequency of change dictated by your security policies.  This keyed approach is analogous to a password one uses to protect the disk data on a personal computer.  The Kerberos authentication, SED drives, and AES key management come as standard issue with the IBM® PureData™ System for Analytics N3001.

IBM’s InfoSphere Data Privacy for Security for Data Warehousing is a separately priced option that organizations should consider when dealing with compliance challenges.  This package will enforce separation of duties, and will report incidents covering user behavior tracked by an audit trail.  Additionally, a business glossary provides the organization with the ability to define and document sensitive data, along with the agreed upon access levels for the appropriate groups.  Data masking and making data fields autonomous, yet viewable by privileged user groups is also important functionality which comes with the InfoSphere Data Privacy for Security for Data Warehousing package.

The IBM® PureData™ System for Analytics N3001 features advanced security based on hardware and software improvements.  When coupled with IBM’s InfoSphere Data Privacy for Security for Data Warehousing (which monitors data going in and out of your data warehouse), you can rest assured your corporation’s sensitive information is protected from unwanted intruders.

More information on the IBM® PureData™ System for Analytics N3001 family can be viewed at this LINK.  There are numerous sessions at the upcoming IBM Insights 2014 Conference (October 26-30) which highlight the speed, simplicity, and security message as seen in many successful data warehouses powered by Netezza technology.  The IBM® PureData™ System for Analytics N3001 is again changing the game for data warehouse appliances.

About Rich Hughes,

Rich Hughes is an IBM Marketing Program Manager for Data Warehousing.  Hughes has worked in a variety of Information Technology, Data Warehousing, and Big Data jobs, and has been with IBM since 2004.  Hughes earned a Bachelor’s degree from Kansas University, and a Master’s degree in Computer Science from Kansas State University.  Writing about the original Dream Team, Hughes authored a book on the 1936 US Olympic basketball team, a squad composed of oil refinery laborers and film industry stage hands. You can follow him on @rhughes134