How to install single node hadoop cluster on Centos 6

What is hadoop?
Hadoop is an open-source framework to store and process Big Data in a distributed environment. It contains two modules
one is MapReduce and another is Hadoop Distributed File System (HDFS).


•MapReduce: It is a parallel programming model for processing large amounts of structured, semi-structured, and
unstructured data on large clusters of commodity hardware.

•HDFS:Hadoop Distributed File System is a part of Hadoop framework, used to store and process the datasets. It
provides a fault-tolerant file system to run on commodity hardware.

Hostname:- server1.soumya.com
OS:- Centos 6

Step 1: Install Java
Download the java

[root@server1 ~]# wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u101-b13/jdk-8u101-linux-x64.tar.gz"
[root@server1 ~]# tar zxvf jdk-8u101-linux-x64.tar.gz

Step 2:-Install Java with Alternatives

After extracting archive file use alternatives command to install it. alternatives command is available in chkconfig
package.

[root@server1 jdk1.8.0_101]# alternatives --install /usr/bin/java java /u01/jdk1.8.0_101/bin/java 2
[root@server1 jdk1.8.0_101]# alternatives --config java

There are 4 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
   1           /usr/lib/jvm/jre-1.5.0-gcj/bin/java
*+ 2           /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
   3           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
   4           /u01/jdk1.8.0_101/bin/java

Enter to keep the current selection[+], or type selection number: 2

Now java8 has been installed.Its recommended to setup javac and jar commands path using alternatives

[root@server1 jdk1.8.0_101]# alternatives --install /usr/bin/jar jar //u01/jdk1.8.0_101/bin/jar 4
[root@server1 jdk1.8.0_101]# alternatives --install /usr/bin/javac javac /u01/jdk1.8.0_101//bin/javac 4
[root@server1 jdk1.8.0_101]# alternatives --set jar /u01/jdk1.8.0_101/bin/jar
[root@server1 jdk1.8.0_101]# alternatives --set javac /u01/jdk1.8.0_101/bin/javac

Now check the java version:-
[root@server1 alternatives]# java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode

Step 3:-
Now configure the enviroment variables:-

# export JAVA_HOME=/u01/jdk1.8.0_101
Setup JRE_HOME Variable
# export JRE_HOME=/u01/jdk1.8.0_101/jre
Setup PATH Variable
# export PATH=$PATH:/u01/jdk1.8.0_101/bin:/u01/jdk1.8.0_101/jre/bin

add the following variables in .bashrc file for autoloading on system boot.

[root@server1] vi ~/.bash_profile

# export JAVA_HOME=/u01/jdk1.8.0_101
Setup JRE_HOME Variable
# export JRE_HOME=/u01/jdk1.8.0_101/jre
Setup PATH Variable
# export PATH=$PATH:/u01/jdk1.8.0_101/bin:/u01/jdk1.8.0_101/jre/bin

:wq (--save & exit)

Step 4:- Create Hadoop user

[root@server1 ~]# adduser hadoop
[root@server1 ~]# passwd hadoop

Now after creation of the user account, its required to create a key for ssh into its own account.

[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
c5:3e:25:c0:92:23:d0:17:fa:56:72:4c:79:72:4c:fe hadoop@server1.soumya.com
The key's randomart image is:
+--[ RSA 2048]----+
|  .o  .+o+.      |
|    o.=o+++      |
|    .o.o++= .    |
|     . + o +     |
|      o S o E    |
|     .     .     |
|                 |
|                 |
|                 |
+-----------------+
[hadoop@server1 ~]$
[hadoop@server1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@server1 ~]$ chmod 0600 ~/.ssh/authorized_keys

Check the connectivity:-
[hadoop@server1 ~]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 0b:59:e4:8b:b1:e6:12:3a:38:4f:ba:74:ef:8a:ad:46.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
[hadoop@server1 ~]$ exit
logout
Connection to localhost closed.
[hadoop@server1 ~]$

Step 5:- Download hadoop 2.6.0
[hadoop@server1 ~]$ wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
[hadoop@server1 ~]$ tar -zxvf hadoop-2.6.0.tar.gz
[hadoop@server1 u01]# mv hadoop-2.6.0 /home/hadoop/hadoop

Step 6:- Edit .bash_profile file and add the following lines from hadoop user.
[hadoop@server1 ~]$ vi /home/hadoop/.bash_profile
export PATH
#Java Env Variables
export JAVA_HOME=/u01/jdk1.8.0_101
export JRE_HOME=/u01/jdk1.8.0_101/jre
export PATH=$PATH:/u01/jdk1.8.0_101/bin:/u01/jdk1.8.0_101/jre/bin


#Hadoop Env Variables
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

:wq(-save & exit)

Now apply the changes in current running environment
[hadoop@server1 ~]$. /home/hadoop/.bash_profile

Now edit the following file and change the java path

[hadoop@server1 u01]# vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/u01/jdk1.8.0_101

:wq

Now Edit hadoop configuration files and add the following lines.
[hadoop@server1 u01]# cd $HADOOP_HOME/etc/hadoop

[hadoop@server1 u01]# vi core-site.xml
<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>
</configuration>

[hadoop@server1 u01]# vi hdfs-site.xml
<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

:wq


[hadoop@server1 u01]# vi mapred-site.xml
<configuration>
 <property>
  <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>


:wq


[hadoop@server1 u01]# vi yarn-site.xml
<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
</configuration>

:wq

Step 7:- Now format the namenode using following command.
[hadoop@server1 u01]# hdfs namenode -format

Sample output:

16/09/09 14:56:22 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = server1.soumya.com/192.168.2.12
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.0
...
...
16/09/09 14:56:25 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
16/09/09 14:56:25 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/09/09 14:56:25 INFO util.ExitUtil: Exiting with status 0
16/09/09 14:56:25 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server1.soumya.com/192.168.2.12
************************************************************/


Step 8:-
Now start the Hadoop Cluster
[hadoop@server1 sbin]$ cd $HADOOP_HOME/sbin/
Now run start-dfs.sh script

[hadoop@server1 sbin]$ start-dfs.sh
16/09/09 15:07:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
hadoop@localhost's password:
localhost: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-server1.soumya.com.out
hadoop@localhost's password:
localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-server1.soumya.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 0b:59:e4:8b:b1:e6:12:3a:38:4f:ba:74:ef:8a:ad:46.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
hadoop@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-server1.soumya.com.out
16/09/09 15:08:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Now run start-yarn.sh script.

[hadoop@server1 sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-server1.soumya.com.out
hadoop@localhost's password:
localhost: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-server1.soumya.com.out

Step 9:-
Check the hadoop services from browser

http://server1.soumya.com:50070/

To access pthe information about cluster and all applications

http://server1.soumya.com:8088/

To get information about secondary namenode.
http://server1.soumya.com:50090/


Step 10:-Test Hadoop Single Node Setup

[hadoop@server1 sbin]$ hdfs dfs -mkdir /user
16/09/09 15:36:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@server1 sbin]$ hdfs dfs -mkdir /user/soumya
16/09/09 15:36:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

1 comment: