Are you on the verge of starting your first big data project? Are you still unsure which technology you should use because of required skill sets? Do you have only a limited budget but need to address the most common big data challenges at once? If you answer these 3 questions with a “YES”, then this blog could be an eye opener for you.
Big data is a challenge for every industry – no matter how big or how small a company may be. The challenges are always very similar: Volume, Variety, Velocity and Veracity. These are 4 indicators for big data requirements. However, most of the time only a subset of these requirements may apply – at least at the beginning of the “big data journey”. Personally, I would add another “V”, which is often not so obvious from the beginning: Value. Value in terms of: what are the expected costs related to big data projects and what is the most probable outcome? Nobody will invest huge amounts of money in new hardware and software if the outcome is very unpredictable.
That’s why most companies start with a “sandbox” big data project: experimenting with trial and open source software on virtual machines and existing hardware in order to keep the initial investment small. But sooner or later, important decisions need to be made: will this be next generation architecture for big data and analytics? How much will it cost to move from a sandbox to a mature production environment? What about enterprise support for the new big data platform?
IBM has acknowledged these challenges and the requirement for an entry-level big data platform. Have you heard of the new “Ultra Lite” PureData N3001-001? Introduced at the end of 2014, this big data appliance is an optimized, powerful combination of hardware and software that is the size of a family pizza-box. It is able to process and store up to 16 Terabytes of structured data and can serve as the center and hub for other required big data products–thus covering the 4 or 5 “V’s” of big data.
The IBM PureData System for Analytics N3001-001 is a factory configured, highly available big data appliance for the processing and storage of structured data at rest. It is architected as a shared-nothing Massive Parallel Processing (MPP) architecture consisting of:
- A server
- A database
- Storage on standard, cost efficient SATA self encrypting drives (SED)
- Networking fabric (10 GBit)
- Analytic software (Netezza technology)
PureData for Analytics comes with production licenses for a suite of other IBM big data products and integrates with these products through well-defined standard industry interfaces (SQL, ODBC, JDBC, OLE DB) for maximum data throughput and reliability. So you get a factory configured, highly available processing MPP platform for todays Big Data analytic requirements.
But not even PureData for Analytics can deal with all “V”s mentioned above. Big data analytics is a team game and that’s the reason why it comes with production licenses for these additional IBM big data products:.
- IBM InfoSphere BigInsights: PureData refines the raw and unstructured data from IBM InfoSphere BigInsights with its ability to process huge an amount of data with its patented and industry leading Netezza technology. PureData reads and writes data to and from Hadoop using state-of-the art integration technology as well as running MapReduce™ programs within its database.
- IBM InfoSphere Information Server: Information Server pushes transformations down to PureData using it’s MPP architecture so that transformations are processed in-database rather on a separate server platform. This helps to reduce network traffic and data movement as well as to reduce the cost of a more powerful server platform for Information Server. Information Server can use PureData analytic and transformational functions and utilize its shared-nothing architecture to process terabytes of structured data per hour.
- IBM COGNOS: COGNOS is the Business Intelligence platform that is optimized to work with PureData. It, supports in-database analytics, pushdown SQL, OLAP over relational and many more features, utilizing the shared-nothing MPP architecture of PureData. COGNOS adds in-memory features to the disk-based PureData architecture, making it able to analyze huge amounts of data.
- IBM InfoSphere Streams: PureData integrates well with Streams and can be a data source, as well as a data sink (target) for Streams. Since Streams is able to process and analyze huge amounts of data / events per second (millions of data packages per second), Streams needs a resourceful target to offload the analyzed data – able to store the terabytes of data required for further, deeper analytics. This is a non-production single license for the Streams product.
Not included but highly recommended
With this big data nucleus you can start your journey with more confidence – with the right basis to grow and scale from the beginning. For an optimal user experience I recommend the following optional products to maximize the results:
- IBM SPSS: PureData is able to act as a powerful scoring platform for IBM SPSS, supporting data mining and predictive use-cases with built-in analytics functions and its massive parallel processing power. With PureData, SPSS does not need an extra scoring server and can even run programs written in R, C, C++, Fortran, Java, Python and NZ-LUA in the core database.
- Watson Explorer: PureData is a supported metadata crawler source for Watson Explorer. It supplies a big data inventory for all structured data stored within the PureData 16 Terabyte capacity.
IBM has made it possible to start the big data journey with small investments, using highly mature, industry leading software and an analytic big data appliance as its core. This helps you make a smooth transition from sandbox to production without disruption. Why not give it a try?
Connect with me on Twitter (@striple66) and meet me during CeBIT 2015 in Hanover, Germany.Follow @striple66
About Ralf Goetz
Ralf is an Expert Level Certified IT Specialist in the IBM Software Group. Ralf joined IBM trough the Netezza acquisition in early 2011. For several years, he led the Informatica tech-sales team in DACH region and the Mahindra Satyam BI competency team in Germany. He then became part of the technical pre-sales representative for Netezza and later for the PureData System for Analytics. Ralf is still focusing on PDA but is also supporting the technical sales of all IBM BigData products. Ralf holds a Master degree in computer science.