The Big Data space is flooded with various offerings around Hadoop (the open source framework). Developers from Yahoo which were involved heavily in early development of Hadoop framework, started a spinoff from Yahoo called HortonWorks and started promoting the framework by providing data and operational services to users and vendors. Competition then became intense with the entry of Cloudera, IBM, MapR, EMC and Amazon into the space.
Here is a look at various distributions
Open source
Latest entrants
Here is a look at various distributions
Proprietory
Cloudera Distribution for Hadoop (CDH)
Amazon’s Elastic Map Reduce on EC2
MapR M3, M5 and M7
EMC’s Greenplum HD
Open source
Hortonworks
Pentaho
Latest entrants
Pivotal HD from EMC
Intel Distribution for Hadoop
Microsoft’s HDInsight distribution of Hadoop for Windows
Differentiating points of each distribution
| Infosphere BigInsights |
Deepest Hadoop platform and application portfolio
Powerful and super fast unstructured analytics engine
Built in browser based spreadsheet tool called Big Sheets
Adaptive real timer analytics enabled trough integration
with Streams
App store with number of re-usable jobs and examples
GPFS file placement optimizer
Accelerators – Social Data and Machine Data Analytics
Software bundles ( IBM InfoSphere Streams, IBM InfoSphere
Data Explorer, and Cognos Business Intelligence)
|
CDH
|
Hadoop pure play with the greatest adoption due to early
entry
Has of late introduced Cloudera Development Kit (CDK) with
collection of libraries, tools and examples
Has of late introduced Impala, the open source interactive
SQL query engine for analyzing data stored in Hadoop cluster
Oracle has adopted it as the distribution of choice in its
Big Data Appliance |
|
Hortonworks Data Platform
|
Yahoo spinoff trying to promote Hadoop by providing data and
operational services to users and vendors
Scalable to meet custom demands
100% complete open source and free without proprietary
license
Widest range of deployment options – linux, windows, and
cloud
|
|
MapR
|
Strong OEM business for its Hadoop Distribution
Provides NoSQL solution besides Hadoop in its latest release
M7
Amazon’s Elastic Map Reduce is powered by MapR
Greenplum’s HD enterprise edition used MapR distribution for
Hadoop so far. Picture may change after EMC’s announcement of its own
distribution – Pivotal HD
|
|
Amazon’s EMP on EC2
|
Most prominent Hadoop cloud service provider
Costing based on usage and therefore can be minimal
Easy to set up, with enormous amount of documentation
|
|
Intel’s Hadoop distribution
|
New in Market – April 2013
Allows analytics on encrypted data
Tweaked Hadoop to take advantage of its hardware - Xeon
components optimized for High performance I/O and storage using solid state
drives and 10Gb Ethernet.
|
|
EMC’s Pivotal HD
|
New in Market – April 2013
It is a radical approach of changing the underlying file
system of RDBMS (Greenplum) to HDFS which means
ü Hadoop operations can be
performed using native SQL queries on Greenplum MPP database whose file system
is modified from NFS to HDFS
ü Addresses barrier to Big Data
by providing opportunity to enterprises to extend their existing db environment
into BigData environment
ü Scalability of Greenplum MPP
can limit data capacity.
|
|
Microsoft’s HDInsight
|
Brings Hadoop to Windows server platform
Choice of deployment option over Windows Azure cloud, or VM,
or Server
Availability of spreadsheet tool (Data Explorer with Excel
2013) for data discovery, transformation and analysis
|
Apache Hadoop subprojects used in most distributions
| Functions | Hadoop subprojects |
| Modeling & Development | MapReduce, Pig, Mahout |
| Storage & Data Management | HDFS, HBase |
| Data Warehousing & Querying | Hive, Sqoop |
| Data Collection, aggregation and analysis | Flume |
| Cluster Mgmt, Job scheduling, workflow | Zookeeper, Oozie, Ambarie, |
No comments:
Post a Comment