What is hadoop?
Hadoop is an open-source framework to store and process Big Data in a distributed environment. It contains two modules
one is MapReduce and another is Hadoop Distributed File System (HDFS).
•MapReduce: It is a parallel programming model for processing large amounts of structured, semi-structured, and
unstructured data on large clusters of commodity hardware.
•HDFS:Hadoop Distributed File System is a part of Hadoop framework, used to store and process the datasets. It
provides a fault-tolerant file system to run on commodity hardware.
Hostname:- server1.soumya.com
OS:- Centos 6
Step 1: Install Java
Download the java
[root@server1 ~]# wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u101-b13/jdk-8u101-linux-x64.tar.gz"
[root@server1 ~]# tar zxvf jdk-8u101-linux-x64.tar.gz
Step 2:-Install Java with Alternatives
After extracting archive file use alternatives command to install it. alternatives command is available in chkconfig
package.
[root@server1 jdk1.8.0_101]# alternatives --install /usr/bin/java java /u01/jdk1.8.0_101/bin/java 2
[root@server1 jdk1.8.0_101]# alternatives --config java
There are 4 programs which provide 'java'.
Selection Command
-----------------------------------------------
1 /usr/lib/jvm/jre-1.5.0-gcj/bin/java
*+ 2 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
3 /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
4 /u01/jdk1.8.0_101/bin/java
Enter to keep the current selection[+], or type selection number: 2
Now java8 has been installed.Its recommended to setup javac and jar commands path using alternatives
[root@server1 jdk1.8.0_101]# alternatives --install /usr/bin/jar jar //u01/jdk1.8.0_101/bin/jar 4
[root@server1 jdk1.8.0_101]# alternatives --install /usr/bin/javac javac /u01/jdk1.8.0_101//bin/javac 4
[root@server1 jdk1.8.0_101]# alternatives --set jar /u01/jdk1.8.0_101/bin/jar
[root@server1 jdk1.8.0_101]# alternatives --set javac /u01/jdk1.8.0_101/bin/javac
Now check the java version:-
[root@server1 alternatives]# java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode
Step 3:-
Now configure the enviroment variables:-
# export JAVA_HOME=/u01/jdk1.8.0_101
Setup JRE_HOME Variable
# export JRE_HOME=/u01/jdk1.8.0_101/jre
Setup PATH Variable
# export PATH=$PATH:/u01/jdk1.8.0_101/bin:/u01/jdk1.8.0_101/jre/bin
add the following variables in .bashrc file for autoloading on system boot.
[root@server1] vi ~/.bash_profile
# export JAVA_HOME=/u01/jdk1.8.0_101
Setup JRE_HOME Variable
# export JRE_HOME=/u01/jdk1.8.0_101/jre
Setup PATH Variable
# export PATH=$PATH:/u01/jdk1.8.0_101/bin:/u01/jdk1.8.0_101/jre/bin
:wq (--save & exit)
Step 4:- Create Hadoop user
[root@server1 ~]# adduser hadoop
[root@server1 ~]# passwd hadoop
Now after creation of the user account, its required to create a key for ssh into its own account.
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
c5:3e:25:c0:92:23:d0:17:fa:56:72:4c:79:72:4c:fe hadoop@server1.soumya.com
The key's randomart image is:
+--[ RSA 2048]----+
| .o .+o+. |
| o.=o+++ |
| .o.o++= . |
| . + o + |
| o S o E |
| . . |
| |
| |
| |
+-----------------+
[hadoop@server1 ~]$
[hadoop@server1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@server1 ~]$ chmod 0600 ~/.ssh/authorized_keys
Check the connectivity:-
[hadoop@server1 ~]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 0b:59:e4:8b:b1:e6:12:3a:38:4f:ba:74:ef:8a:ad:46.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
[hadoop@server1 ~]$ exit
logout
Connection to localhost closed.
[hadoop@server1 ~]$
Step 5:- Download hadoop 2.6.0
[hadoop@server1 ~]$ wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
[hadoop@server1 ~]$ tar -zxvf hadoop-2.6.0.tar.gz
[hadoop@server1 u01]# mv hadoop-2.6.0 /home/hadoop/hadoop
Step 6:- Edit .bash_profile file and add the following lines from hadoop user.
[hadoop@server1 ~]$ vi /home/hadoop/.bash_profile
export PATH
#Java Env Variables
export JAVA_HOME=/u01/jdk1.8.0_101
export JRE_HOME=/u01/jdk1.8.0_101/jre
export PATH=$PATH:/u01/jdk1.8.0_101/bin:/u01/jdk1.8.0_101/jre/bin
#Hadoop Env Variables
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
:wq(-save & exit)
Now apply the changes in current running environment
[hadoop@server1 ~]$. /home/hadoop/.bash_profile
Now edit the following file and change the java path
[hadoop@server1 u01]# vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/u01/jdk1.8.0_101
:wq
Now Edit hadoop configuration files and add the following lines.
[hadoop@server1 u01]# cd $HADOOP_HOME/etc/hadoop
[hadoop@server1 u01]# vi core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
[hadoop@server1 u01]# vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
:wq
[hadoop@server1 u01]# vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
:wq
[hadoop@server1 u01]# vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
:wq
Step 7:- Now format the namenode using following command.
[hadoop@server1 u01]# hdfs namenode -format
Sample output:
16/09/09 14:56:22 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = server1.soumya.com/192.168.2.12
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
...
...
16/09/09 14:56:25 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
16/09/09 14:56:25 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/09/09 14:56:25 INFO util.ExitUtil: Exiting with status 0
16/09/09 14:56:25 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server1.soumya.com/192.168.2.12
************************************************************/
Step 8:-
Now start the Hadoop Cluster
[hadoop@server1 sbin]$ cd $HADOOP_HOME/sbin/
Now run start-dfs.sh script
[hadoop@server1 sbin]$ start-dfs.sh
16/09/09 15:07:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
hadoop@localhost's password:
localhost: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-server1.soumya.com.out
hadoop@localhost's password:
localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-server1.soumya.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 0b:59:e4:8b:b1:e6:12:3a:38:4f:ba:74:ef:8a:ad:46.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
hadoop@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-server1.soumya.com.out
16/09/09 15:08:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Now run start-yarn.sh script.
[hadoop@server1 sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-server1.soumya.com.out
hadoop@localhost's password:
localhost: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-server1.soumya.com.out
Step 9:-
Check the hadoop services from browser
http://server1.soumya.com:50070/
To access pthe information about cluster and all applications
http://server1.soumya.com:8088/
To get information about secondary namenode.
http://server1.soumya.com:50090/
Step 10:-Test Hadoop Single Node Setup
[hadoop@server1 sbin]$ hdfs dfs -mkdir /user
16/09/09 15:36:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@server1 sbin]$ hdfs dfs -mkdir /user/soumya
16/09/09 15:36:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable