Brings Hadoop, Spark and SQL into one flexible, open analytics platform
by Andrea Braida
Today, we are pleased to announce that IBM BigInsights® 4.2 is generally available. BigInsights 4.2 is built on IBM Open Platform (IOP), IBM’s big data platform with Apache Spark and Apache Hadoop. IOP offers the ideal combination of Apache components to support big data applications. The BigInsights 4.2 release puts the full range of analytics for Hadoop, Spark and SQL into the hands of advanced analytics and data science teams on a single platform.
IBM has deep Hadoop expertise, and in the last year, has moved into a very strong Apache Spark leadership position as well. IBM is integrating and embedding Spark across its analytics portfolio, which means that customers get Spark in any way they want it. No one else in the market is doing this today. (BigInsights 4.2 also includes comprehensive machine language support – Spark, SystemML and integration with H2O.)
If a recommended Hadoop distribution is something you’re interested in, the most significant release features, including Spark integration, are summarized for you below.
What’s new in BigInsights 4.2?
BigInsights 4.2 introduces a range of new capabilities that make it more open, flexible and powerful:
Integration with Apache Spark 1.6.1
Access the processing and analytics power of Spark, which includes dramatically speeding up batch and ETL processing times with the Spark Core, near real-time analytics with Spark Streaming, built-in machine learning libraries which are highly extensible using Spark MLlib, querying of unstructured data and more value from free-form text analytics with Spark SQL, and graph computation/graph analytics with Spark GraphX.
IBM Big SQL enhancements for RDBMS offload and consolidation.
Big SQL now understands SQL dialects from other vendors and products, such as Oracle, IBM DB2® and IBM Netezza®, making it the ultimate platform for RDBMS offload and consolidation. It is faster and easier to offload old data from existing enterprise data warehouses or data marts to free up capacity while preserving most of the familiar SQL from those platforms. BigSQL is also the only SQL engine for Hadoop that exploits Hive, HBase, and Spark concurrently for best in class analytic capabilities.
New Apache components and currency updates to existing components
BigInsights 4.2 now includes Apache Ranger, Apache Phoenix and Apache Titan. BigInsights is currently the only Hadoop distribution with Graph Database. Notable currency updates include updates to Ambari, Kafka, and SOLR.
ODPI Runtime Certification
With V4.2, IOP is among the first Hadoop platforms to comply with the Open Data Platform (ODPi) Runtime Certification. This means it is easier for independent software vendors to adopt IOP as a platform, and ensures platform openness for customers.
Introducing IBM Big Replicate
IBM Big Replicate provides continuous availability and data consistency via a patented active-transactional replication technology which also provides streaming backup, hybrid cloud, and burst-to-cloud. This is an optimized data replication capability for uninterrupted migration between different distributions to IBM, cloud to on-prem, and vice versa.
Why should you consider BigInsights 4.2?
Some key standout features for BigInsights 4.2 are BigSQL performance imporvements, deeper analytics with Spark and Graph Database, and a more open and secure platform.
BigSQL performance improvements
BigSQL is the SQL query engine in BigInsights. New performance improvements make it super fast, and super easy to install and manage. These enhancements to BigSQL 4.2 result in significant performance improvements:
- Built-in components improve performance with less tuning (auto-analyze)
- Improved memory management and operational stability
- High performance transactional support is now included
- Apache Phoenix provides easier access to Hbase with a SQL interface
- In Technology Preview, in-memory technology (BLU Acceleration) on Big SQL head nodes is now available for faster processing
These enhancements make BigInsights an ideal platform for RDBMS off-load and consolidation, as well as a hybrid engine that can help you exploit fit-for-purpose Hadoop subsystems.
Deeper and improved analytics with Spark and Graph Database
- Easier and richer text analytics
- New AQL Editor makes it easier to migrate existing AQL to V4.2
- Web-based, drag-and-drop development
- Powerful, expressive, AQL language to get more done, with less work
- New run-on-cluster with Spark
- Pre-built extractors: Named Entity, Financial, Sentiment, Machine Data
- Graph Database – Titan
- IOP is the first Hadoop distribution to include a graph database in its distribution
More open and more secure
- For security, BigInsights 4.2 is compliant with industry standards, and includes Apache Ranger which provides centralized security management and auditing of users and the REST interface. It supports HDFS, YARN, Hive, HBase, and Kafka, allowing users to focus more time on analyzing data versus worrying about security.
- BigInsights now enables easy product integration with ODPI Runtime Certification. With V4.2, IOP is among the first Hadoop platforms to comply with the Open Data Platform (ODPi) Runtime Certification. This means it is easier for independent software vendors to adopt IOP as a platform, and it ensures platform openness for clients.
The BigInsights’ core – IBM Open Platform (IOP) – was designed with a focus on analytics, operational excellence, and security empowerment, and is certified by the Open Data Platform Initiative (ODPi).
Get started free
BigInsights is available on-premises, on-cloud, and is integrated with other systems in use today, with enterprise-class support available. (Please note that BigQuality, BigIntegrate, Phoenix, Ranger, Solr, and Titan are available on BigInsights on-premises only, and are planned for the on-cloud offering.*)
BigInsights is also integrated with a broad and open ecosystem of data and analytics tools, allowing for a true hybrid architecture. BigInsights on Cloud was recently ranked as a leader in the Hadoop Cloud services market by Forrester, which I’ll share more about in my next blog.
Get started with a free version of the BigInsights core, IBM Open Platform (IOP). Click here.
And for more information about the 4.2 release, please visit our release overview or refer to the Big Replicate overview. Or visit the Hadoop solutions page.
Andrea Braida is a Portfolio Marketing Manager at IBM for Big Data Analytics and Data Science offerings. A former start-up founder, she has extensive product management, product marketing, and data science marketing experience within both global technology giants and start-ups. Andrea is based in Seattle, Washington.
* The information contained in this presentation is provided for informational purposes only.
While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided “as is”, without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other documentation. Nothing contained in this presentation is intended to, or shall have the effect of: 1) Creating any warranty or representation from IBM (or its affiliates or its or their suppliers and/or licensors); or 2) altering the terms and conditions of the applicable license agreement governing the use of IBM software.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multi-programming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.