Saturday, May 2, 2015

Hadoop in a nutshell

Hi

Main features
  • Cheap huge amount of storage
  • process huge amount of storage quickly
  • can store unstructured data like text, images and video
  • saleable to infinity (nodes)
  • data is software protected against hardware failure

Components included in the basic download 
  • HDFS : a java based distributed file system which can store all kind of data without prior organization
  • MapReduce : a software programming model for parallel computing
  • YARN : schedule and handle resource request from distributed applications

Other components exists :pig,hive,hbase,zookeeper,ambari,flume,sqoop,oozie

How does data get into Hadoop :
  • you can load files to the HDFS using simple java commands.
  • in case you have many files you can invoke a script that loads the files in parallel
  • .......


References




Nathan

No comments:

Post a Comment