Decision Tree in R Programming

The decision tree uses the branching method to show every possible output for the specific input. We can draw a decision tree by hand or create it using specialized software or a graphics program. A decision tree is a supervised machine learning technique that performs both tasks, i.e., regression and classification.

A decision tree graph is useful for focusing discussion when a group has to decide. A decision tree is a probability tree that enables the user to decide about any process. A decision tree is interpreted by nodes, branches and leaf nodes. Each attribute of the tests is represented at the nodes, and the outcome is defined at the branches. The class labels are represented at the leaf nodes.

An example of the decision tree is as follows:

  • Choosing the best manufacturing item amongst all the items. i.e., item A, item B, item C or item D.
  • Choosing the best stock for long-term investing, i.e., stock A, stock B, stock C or stock D.
  • Checking whether an individual is healthy or unhealthy.

Working of Decision Tree:

DECISION TREE IN R PROGRAMMING
  • PARTITIONING:
    • Partitioning is the process of splitting the data set into subsets.
    • Algorithms such as chi-square and Gini index are used for partitioning purposes, and the algorithm whose efficiency is best is chosen.
  • PRUNING:
    • In this process, branch nodes are converted into leaf nodes, and it results in the shortening of the branches of the tree.
    • We can avoid the concept of overfitting by simpler trees.
  • SELECTION OF TREE:
    • In this process, our main aim is to select the smallest tree which fits the data.

Important factors to consider while selecting the tree in R programming:

  • Entropy
  • Information Gain

To better understand the decision tree concept, we will take an example.

EXAMPLE:

CODE 1:

# DECISION TREE
# CODE:
# BEFORE STARTING THE DECISION TREE, WE WILL IMPORT THE NEEDED LIBRARY INTO OUR DATASET.
# AS MENTIONED BELOW:


library(datasets)
library(caTools)
library(party)
library(dplyr)
library(magrittr)

OUTPUT 1:

DECISION TREE IN R PROGRAMMING

CODE 2:

# DECISION TREE
# CODE:
# We will import the reading skills dataset in RStudio as shown below:


print(data("readingSkills"))


# head() function will print starting rows of our dataset.
head(readingSkills)

OUTPUT 2:

DECISION TREE IN R PROGRAMMING

CODE 3:

# DECISION TREE
# EXAMPLE


# Here, in this step, we are loading the party package. 
# It will automatically load other dependent and required packages.
library(party)


# Creating the input data frame in this step.
input.data <- readingSkills[c(1:105),]


# In this step, we are creating the tree.
output.tree <- ctree(
  nativeSpeaker ~ age + shoeSize + score, 
  data = input.data)


# At last, we are plotting the tree.
plot(output.tree)

OUTPUT 3:

DECISION TREE IN R PROGRAMMING

CONCLUSION:

From the above decision tree, we can conclude that anyone whose reading skills score is less than 38.3 and the age is more than six is not considered a native speaker.