Subscribe to our Newsletter

Deploy Hadoop Cluster

Originally posted on Data Science Central

Step by Step Tutorial to Deploy Hadoop Cluster (fully distributed mode):

Setting Hadoop in cluster requires multiple machines/nodes, one node will act as master and rest all will act as slaves.
If you want Hadoop quick introduction please click here.
If you want to setup hadoop in pseudo distributed mode please click here
In this tutorial:
  • I am using 3 nodes, 1 master 2 slaves
  • I am using Cloudera distribution for Apache hadoop CDH3U3 (you can use Apache hadoop(0.20.X) also)
  • I am deploying hadoop on ubuntu (you can use other OS (cent OS, Redhat, etc))
Install / Setup Hadoop on cluster
Install Hadoop on master:
1. Add entry of master and slaves in hosts file:
Edit hosts file and following add entries
$ sudo pico /etc/hosts
MASTER-IP    master
SLAVE01-IP   slave01
SLAVE02-IP   slave02
(In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of corresponding IP)

Prequisite:
2. Install Java
java 6 is recommended (either sun or open jdk)
Add repository (blow mentioned repository is for ubuntu 11.10 for other version please add corresponding repository)
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:ferramroberto/java
$ sudo apt-get update
$ sudo apt-get install sun-java6-jdk

to read complete post Please visit

http://www.technology-mania.com/2012/04/deploy-hadoop-cluster.html

Email me when people comment –

You need to be a member of Hadoop360 to add comments!

Join Hadoop360

Resources

Research