Tuesday, August 25, 2015

Experimenting with Hadoop

Hi

Looking for fast setup ~ 1 hour so i can play with Hadoop ecosystem

1. Hortonworks sandbox + Azure 

http://hortonworks.com/blog/hortonworks-sandbox-azure/
Pro
  • you can start playing with it after about an hour 
  • Machine on the cloud - saves local machine CPU \ Disk \ Ram
  • Free for a month (Azure)
  • UI - Web browser

 Cons
  • Limited to a month (Azure)
  • Not full Hadoop echsystem e.g. Spark is missing and only Hive,Pig,HCatalog,Ozzie are accessed via Hue

remarks :
  • you need an account to log to Azure



2. Hortonworks sandbox

http://hortonworks.com/products/hortonworks-sandbox/#install
Pro
    • you can start playing with it after few hours
    • free
    • Hadoop echsystem seems to include major part of its ecosystem

     Cons
      • use local PC which eat the disk \ cpu \ ram
      • requires PC with 4-8 MB RAM
      • require Virtual machine which have to be installed.
      • UI- shell
      Remarks :

      • one node out of the box


      3. Cloudera Live Read Only Demo
      http://blog.cloudera.com/blog/2014/04/cloudera-live/
      use http://demo.gethue.com/ after registration

      Pro
        • free 
        • uses a cluster with 4 nodes (machines)
        • you can start playing with it immidiately
        • Machine on the cloud - saves local machine CPU \ Disk \ Ram
        • Hadoop echsystem seems to include major part of its ecosystem


        Cons
          • samples not working
          • limited to 3 hours per session



          4. Cloudera Deploy on AWS
          http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-live/aws-documentation.html

          Pro
            • you can start playing with it immidiately
            • Machine on the cloud - saves local machine CPU \ Disk \ Ram
            • cluser with 4 nodes (machines)
            • Hadoop echsystem seems to include major part of its ecosystem
            • Has few components :
              • Tutorial
              • Cloudera Manager 
              • Cloudera Navigator
              • Hue UI


             Cons
              • cost about 750$ per month 

              remarks :
              • you need an account to log to AWS
              • in case you are new commer to AWS and have AWS free tier (thats for 12 month) then you get for free every month :
              • EBS I/Os - 2,000,000 
              • EBS  Volumes - 30GB
              • EC2  Linux- 750 hours
              • KMS Requests - 20,000
              but you will nontheless pay $0.312 per On Demand RHEL m4.xlarge Instance Hour. You will pay this for every hour the 4 instances are running. for a month you will get 4*24*30*0.312 ~ 900$ which is more then claimed by Cloudera - 750$



              5. Cloudera Deploy on VM
              http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-4-x.html


              remarks

              • looks like "2. Hortonworks sandbox" 
              • requires host os to be 64 bit

              6. HDInsight

              Pro
              • you can start playing with it immidiately
              • Machine on the cloud - saves local machine CPU \ Disk \ Ram
              • Hadoop echsystem seems to include major part of its ecosyste
              • has 1 month free usage


              Cons

              • cost ??? (pay as you go)



              Tutorials


              Nathan

              1 comment: