Hadoop Interview Questions

Facebooktwitterredditpinterestlinkedinmailby feather

1) What is Hadoop?

Hadoop is a framework which is used to store and process Big Data.

It is a distributed computing platform and helps in analyzing Big Data.

2) What are the types of input formats defined in Hadoop?

There are three types of input formats:

  • TextInputFormat(By Default Input Format)
  • KeyValueInputFormat
  • SequenceFileInputFormat

3) What is JobTracker in Hadoop?

It is a service which is used to run MapReduce jobs on the cluster.

4) What is NameNode in Hadoop?

It is the node which is used to stores all the file location information in HDFS (Hadoop Distributed File System).

It is also used to tracks the file data across the cluster or multiple machines.

5) What are the basic parameters of a Mapper?

There are two types of basic parameters are:

  • Text and IntWritable
  • LongWritable and Text

6) What are the types of configuration files in Hadoop?

There are three types configuration files are:

  • core-site.xml
  • hdfs-site.xml
  • mapred-site.xml

7) What are the core components of Flume?

The core components of Flume are:

  • Event
  • Source
  • Sink
  • Channel
  • Agent
  • Client

8) What does ‘jps’ command do?

It is used to check the Hadoop daemons are running or not. It shows all the Hadoop daemons that are running on the machine.

9) What are the basic differences between relational database and Hadoop?

Difference between relational database and HDFS are:

RDBMS Hadoop
It relies on the structured data and the schema In Hadoop, Any kind of data can be stored
Here, schema validation is done before loading the data Here, Hadoop follows the schema on read policy.
It is used for OLTP (Online Trasanctional Processing) system It is used for Data discovery

10) What are the different relational operations in “Pig Latin” you worked with?

Relational Operators are:

  • join
  • limit
  • group
  • filters
  • for each
  • order by
  • distinct

11) What is the default location where “Hive” stores table data?

Inside HDFS in /user/hive/warehouse

12) What are the components of Apache HBase?

Three major components are:

  • HMaster
  • ZooKeeper
  • Region Server

13) What are the components of Region Server?

Components of Region Server are:

  • WAL
  • HFile
  • MemStore
  • Block Cache

14) Name some companies that use Hadoop?

Companies that use Hadoop are:

  • Adobe
  • eBay
  • Facebook
  • Netflix
  • Amazon

15) What is the port number for NameNode, Task Tracker and Job Tracker?

Port number for:

  • NameNode is 50070
  • Task Tracker is 50060
  • Job Tracker is 50030
Facebooktwitterredditpinterestlinkedinmailby feather

Leave a Comment