1) What is Hadoop?
Hadoop is a framework which is used to store and process Big Data.
It is a distributed computing platform and helps in analyzing Big Data.
2) What are the types of input formats defined in Hadoop?
There are three types of input formats:
- TextInputFormat(By Default Input Format)
- KeyValueInputFormat
- SequenceFileInputFormat
3) What is JobTracker in Hadoop?
It is a service which is used to run MapReduce jobs on the cluster.
4) What is NameNode in Hadoop?
It is the node which is used to stores all the file location information in HDFS (Hadoop Distributed File System).
It is also used to tracks the file data across the cluster or multiple machines.
5) What are the basic parameters of a Mapper?
There are two types of basic parameters are:
- Text and IntWritable
- LongWritable and Text
6) What are the types of configuration files in Hadoop?
There are three types configuration files are:
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
7) What are the core components of Flume?
The core components of Flume are:
- Event
- Source
- Sink
- Channel
- Agent
- Client
8) What does 'jps' command do?
It is used to check the Hadoop daemons are running or not. It shows all the Hadoop daemons that are running on the machine.
9) What are the basic differences between relational database and Hadoop?
Difference between relational database and HDFS are:
RDBMS |
Hadoop |
It relies on the structured data and the schema |
In Hadoop, Any kind of data can be stored |
Here, schema validation is done before loading the data |
Here, Hadoop follows the schema on read policy. |
It is used for OLTP (Online Trasanctional Processing) system |
It is used for Data discovery |
10) What are the different relational operations in “Pig Latin” you worked with?
Relational Operators are:
- join
- limit
- group
- filters
- for each
- order by
- distinct
11) What is the default location where "Hive" stores table data?
Inside HDFS in /user/hive/warehouse
12) What are the components of Apache HBase?
Three major components are:
- HMaster
- ZooKeeper
- Region Server
13) What are the components of Region Server?
Components of Region Server are:
- WAL
- HFile
- MemStore
- Block Cache
14) Name some companies that use Hadoop?
Companies that use Hadoop are:
- Adobe
- eBay
- Facebook
- Netflix
- Amazon
15) What is the port number for NameNode, Task Tracker and Job Tracker?
Port number for:
- NameNode is 50070
- Task Tracker is 50060
- Job Tracker is 50030