Installation Hadoop on Ubuntu

Level 1) Download and install Hadoop

1. beginning you have to create a Hadoop system user through the following command-

sudo addgroup hadoop_

Installation Hadoop on Ubuntu
sudo adduser --ingroup hadoop_ hduser_
Installation Hadoop on Ubuntu
Now, write your credentials such as - password, full name, room number, and other details carefully.
Installation Hadoop on Ubuntu
sudo adduser hduser_ sudo
Installation Hadoop on Ubuntu

2. Configuration of  SSH.

Manage the nodes in a cluster, Hadoop system requires SSH access

First, you need to switch user by writing the following command

su - hduser_
Installation Hadoop on Ubuntu

The command will create a new key.

ssh-keygen -t rsa -P ""
Installation Hadoop on Ubuntu

Now enable SSH access to the local machine by using this key as shown.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Installation Hadoop on Ubuntu

Now by connecting to localhost as 'hduser' user, test SSH setup.

ssh localhost
Installation Hadoop on Ubuntu

Purge SSH using the following command,

sudo apt-get purge openssh-server

 Purge before the start of installation comes under the excellent practice of coding.

Installation Hadoop on Ubuntu

Install SSH using the following command-

sudo apt-get install openssh-server
Installation Hadoop on Ubuntu

3. Now download Hadoop as shown in the image-

Installation Hadoop on Ubuntu

Click on the link to download it.

Installation Hadoop on Ubuntu

Go to the directory containing the downloaded .tar file, as shown in the image. And use the following code.

Installation Hadoop on Ubuntu
sudo tar xzf hadoop-2.9.2.tar.gz
Installation Hadoop on Ubuntu

Now, rename hadoop-2.9.2 as Hadoop, as shown by using the given command.

sudo mv hadoop-2.9.2 hadoop
Installation Hadoop on Ubuntu
sudo chown -R hduser_:hadoop_ hadoop
Installation Hadoop on Ubuntu

Level 2) Configuration of Hadoop

 1. Modify ~/.bashrc file

Add this lines at the end of file ~/.bashrc

#Set HADOOP_HOME

export HADOOP_HOME=<Installation Directory of Hadoop>

#Set JAVA_HOME

export JAVA_HOME=<Installation Directory of Java>

# Add bin/ locationn of Hadoop to PATH

export PATH=$PATH:$HADOOP_HOME/bin
Installation Hadoop on Ubuntu

We will source this environment configuration by using the command given below.

. ~/.bashrc
Installation Hadoop on Ubuntu

2. Configurations related to HDFS

Now ,set JAVA_HOME inside the file $HADOOP_HOME/etc/hadoop/hadoop-env.sh as shown in the image.

Installation Hadoop on Ubuntu
Installation Hadoop on Ubuntu
$HADOOP HOME / etc / hadoop / core-site.xml has two
parameters that need to be set.
  • 'hadoop.tmp.dir' - Used to define a directory that Hadoop will use to store its data files.
  • 'fs.default.name' -  This command specifies the default file system.

Open core-site.xml, to set these parameters.

command-sudo gedit $HADOOP_  HOME/ etc/ hadoop  /core-site.xml

Installation Hadoop on Ubuntu

Grab and paste below line of code in between tags <configuration></configuration>

<property>
 <name>hadoop.tmp.dir</name>
 <value>/app/hadoop/tmp</value>
 <description>Parent directory for other temporary directories.</description>
 </property>
 <property>
 <name>fs.defaultFS </name>
 <value>hdfs://localhost:54310</value> 
 <description>The name of the default file system. </description> 
 </property> 
Installation Hadoop on Ubuntu

Now, go to the directory $HADOOP_HOME/etc/Hadoop

Installation Hadoop on Ubuntu

Create the directory mentioned in core-site.xml using the following code, as shown in the image.

sudo mkdir -p <Path of Directory used in above setting>

Installation Hadoop on Ubuntu

Grant all the permissions required to the directory by using the following command.

sudo chown -R hduser_:Hadoop_ <Path of Directory created in above step>

sudo chmod 750 <Path of Directory created in above step>

Installation Hadoop on Ubuntu

3. Configuration of Map Reduce.

Lets set the HADOOP HOME path before you start these configurations. Use the following code for the reference.

sudo gedit /etc/profile.d/hadoop.sh

And

export HADOOP_HOME=/home/shivi/Downloads/Hadoop

Installation Hadoop on Ubuntu

Next write the following code.

 command- sudo chmod +x /etc/profile.d/hadoop.sh

Installation Hadoop on Ubuntu

Now, exit the Terminal and restart it. And type the following code.

 echo $HADOOP_HOME. 

To verify the path

Installation Hadoop on Ubuntu

Copy the files. For reference, use the following command.

command- sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template
$HADOOP_HOME/etc/hadoop/mapred-site.xml
Installation Hadoop on Ubuntu

Now, open the file named mapred-site.xml by using the following command.

command- sudo gedit $HADOOP_HOME/etc/hadoop/mapred-site.xml

Installation Hadoop on Ubuntu

Copy and paste the given code in between the tags <configuration> and </configuration>

<property>
 <name>mapreduce.jobtracker.address</name>
 <value>localhost:54311</value>
 <description>This host and port is run by MapReduce job tracker. 
 </description>
 </property> 
Installation Hadoop on Ubuntu

Now, open $HADOOP_HOME/etc/hadoop/hdfs-site.xml for reference use the following code,

sudo gedit $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Installation Hadoop on Ubuntu

Copy and paste the below lines of code between the tags <configuration> and </configuration>

<property>
 <name>dfs.replication</name>
 <value>1</value>
 <description>Default block replication.</description>
 </property>
 <property>
 <name>dfs.datanode.data.dir</name>
 <value>/home/hduser_/hdfs</value> 
 </property> 
Installation Hadoop on Ubuntu

Now, create a directory as given in the setting, as shown above, by using the following line of code.

sudo mkdir -p <Path of Directory used in above setting>

sudo mkdir -p /home/hduser_/hdfs
Installation Hadoop on Ubuntu
sudo chown -R hduser_:hadoop_ <Path of Directory created in above step>
sudo chown -R hduser_:hadoop_ /home/shivi/hduser_/hdfs
Installation Hadoop on Ubuntu

sudo chmod 750 <Path of Directory created in above step>

sudo chmod 750 /home/hduser_/hdfs
Installation Hadoop on Ubuntu

4. Until we first start Hadoop, format HDFS using the command below.

$HADOOP_HOME/bin/hdfs namenode –format
Installation Hadoop on Ubuntu

5. Now start Hadoop single node cluster using the command given below.

$HADOOP_HOME/sbin/start-dfs.sh
Installation Hadoop on Ubuntu
$HADOOP_HOME/sbin/start-yarn.sh
Installation Hadoop on Ubuntu

If Hadoop has successfully started, then a jps output will show NameNode, NodeManager, ResourceManager, SecondaryNameNode, DataNode.

6. Stopping the Hadoop.

$HADOOP_HOME/sbin/stop-dfs.sh
Installation Hadoop on Ubuntu
$HADOOP_HOME/sbin/stop-yarn.sh
Installation Hadoop on Ubuntu