Python Tutorial

Introduction Python Features Python Applications Python System requirements Python Installation Python Examples Python Basics Python Indentation Python Variables Python Data Types Python IDE Python Keywords Python Operators Python Comments Python Pass Statement

Python Conditional Statements

Python if Statement Python elif Statement Python If-else statement Python Switch Case

Python Loops

Python for loop Python while loop Python Break Statement Python Continue Statement Python Goto Statement

Python Arrays

Python Array Python Matrix

Python Strings

Python Strings Python Regex

Python Built-in Data Structure

Python Lists Python Tuples Python Lists vs Tuples Python Dictionary Python Sets

Python Functions

Python Function Python min() function Python max() function Python User-define Functions Python Built-in Functions Python Recursion Anonymous/Lambda Function in Python apply() function in python Python lambda() Function

Python File Handling

Python File Handling Python Read CSV Python Write CSV Python Read Excel Python Write Excel Python Read Text File Python Write Text File Read JSON File in Python

Python Exception Handling

Python Exception Handling Python Errors and exceptions Python Assert

Python OOPs Concept

OOPs Concepts in Python Classes & Objects in Python Inheritance in Python Polymorphism in Python Python Encapsulation Python Constructor Python Super function Python Static Method Static Variables in Python Abstraction in Python

Python Iterators

Iterators in Python Yield Statement In Python Python Yield vs Return

Python Generators

Python Generator

Python Decorators

Python Decorator

Python Functions and Methods

Python Built-in Functions Python String Methods Python List Methods Python Dictionary Methods Python Tuple Methods Python Set Methods

Python Modules

Python Modules Python Datetime Module Python Math Module Python Import Module Python Time ModulePython Random Module Python Calendar Module CSV Module in Python Python Subprocess Module

Python MySQL

Python MySQL Python MySQL Client Update Operation Delete Operation Database Connection Creating new Database using Python MySQL Creating Tables Performing Transactions

Python MongoDB

Python MongoDB

Python SQLite

Python SQLite

Python Data Structure Implementation

Python Stack Python Queue Python Linked List Python Hash Table Python Graph

Python Advance Topics

Speech Recognition in Python Face Recognition in Python Python Linear regression Python Rest API Python Command Line Arguments Python JSON Python Subprocess Python Virtual Environment Type Casting in Python Python Collections Python Attributes Python Commands Python Data Visualization Python Debugger Python DefaultDict Python Enumerate

Python 2

What is Python 2

Python 3

Anaconda in Python 3 Anaconda python 3 installation for windows 10 List Comprehension in Python3

How to

How to Parse JSON in Python How to Pass a list as an Argument in Python How to Install Numpy in PyCharm How to set up a proxy using selenium in python How to create a login page in python How to make API calls in Python How to run Python code from the command prompt How to read data from com port in python How to Read html page in python How to Substring a String in Python How to Iterate through a Dictionary in Python How to convert integer to float in Python How to reverse a string in Python How to take input in Python How to install Python in Windows How to install Python in Ubuntu How to install PIP in Python How to call a function in Python How to download Python How to comment multiple lines in Python How to create a file in Python How to create a list in Python How to declare array in Python How to clear screen in Python How to convert string to list in Python How to take multiple inputs in Python How to write a program in Python How to compare two strings in Python How to create a dictionary in Python How to create an array in Python How to update Python How to compare two lists in Python How to concatenate two strings in Python How to print pattern in Python How to check data type in python How to slice a list in python How to implement classifiers in Python How To Print Colored Text in Python How to open a file in python How to Open a file in python with Path How to run a Python file in CMD How to change the names of Columns in Python How to Concat two Dataframes in Python How to Iterate a List in Python How to learn python Online How to Make an App with Python How to develop a game in python How to print in same line in python How to create a class in python How to find square root in python How to import numy in python How to import pandas in python How to uninstall python How to upgrade PIP in python How to append a string in python How to comment out a block of code in Python How to change a value of a tuple in Python How to append an Array in Python How to Configure Python Interpreter in Eclipse How to plot a Histogram in Python How to Import Files in Python How to Download all Modules in Python How to get Time in seconds in Python How to Practice Python Programming How to plot multiple linear regression in Python How to set font for Text in Python How to Convert WhatsApp Chat Data into a Word Cloud using Python How to Install Tweepy in Python How to Write a Configuration file in Python How to Install Scikit-Learn How to add 2 lists in Python How to assign values to variables in Python and other languages How to build an Auto Clicker using Python How to check if the dictionary is empty in Python How to check the version of the Python Interpreter How to convert Float to Int in Python How to Convert Int to String in Python How to Define a Function in Python How to Install Pandas in Python How to Plot Graphs Using Python How to Program in Python on Raspberry pi How to Reverse a number in Python How to Sort a String in Python How to build a Virtual Assistant Using Python How to Fix an EOF Error in Python How to make a firewall in Python How to clear screen in Python How to Create User Defined Exceptions in Python How to determine if a binary tree is height-balanced How to Import Kaggle Datasets Directly into Google Colab How to Install Python in Kali Linux

Sorting

Python Sort List Sort Dictionary in Python Python sort() function Python Bubble Sort

Programs

Factorial Program in Python Prime Number Program in Python Fibonacci Series Program in Python Leap Year Program in Python Palindrome Program in Python Check Palindrome In Python Calculator Program in Python Armstrong Number Program in Python Python Program to add two numbers Anagram Program in Python Number Pattern Programs in Python Even Odd Program in Python GCD Program in Python Python Exit Program Python Program to check Leap Year Operator Overloading in Python Pointers in Python Python Not Equal Operator Raise Exception in Python Salary of Python Developers in India

Questions

What is a Script in Python What is the re.sub() function in Python After Python What Should I Learn What Does the Percent Sign (%) Mean in Python What is online python free IDE What is Python online compiler What are the Purposes of Python What is Python compiler GDB What is Ipython shell What does base case mean in recursion What does the if __name__ == "__main__" do in Python What is Sleeping Time in Python What is Collaborative Filtering in ML, Python What is the Python Global Interpreter Lock What is None Literal in Python What is the Output of the bool in Python Is Python Case-sensitive when Dealing with Identifiers Is Python Case Sensitive Is Python Object Oriented Programming language

Differences

Difference between Perl and Python Difference between python list and tuple Difference between Input() and raw_input() functions in Python Difference between Python 2 and Python 3 Difference Between Yield And Return In Python Important Difference between Python 2.x and Python 3.x with Example Difference between Package and Module in Python Difference between Expression and Statement in Python Difference between For Loop and While Loop in Python Difference between Module and Package in Python Difference between Sort and Sorted in Python Difference between Overloading and Overriding in Python Python vs HTML Python vs R Python vs Java Python vs PHP While vs For Loop in Python Python Iterator vs Iterable Set Vs List Python

Python Kivy

Kivy Architecture Kivy Buttons Kivy Layouts Kivy Widgets What is Kivy in Python

Python Tkinter

Application to get live USD/INR rate Using Tkinter in Python Application to Search Installed Application using Tkinter in Python Compound Interest GUI Calculator using Tkinter in Python Create a Table Using Tkinter in Python Create First GUI Application using Tkinter in Python File Explorer using Tkinter in Python GUI Calendar using Tkinter in Python GUI to extract lyrics from a song Using Tkinter in Python GUI to Shut down, Restart and Logout from the PC using Tkinter in Python Loan calculator using Tkinter in Python Make Notepad using Tkinter in Python Rank Based Percentile GUI Calculator using Tkinter in Python Screen Rotation app Using Tkinter in Python Spell Corrector GUI using Tkinter in Python Standard GUI Unit Converter using Tkinter in Python Text detection using Tkinter in Python To Do GUI Application using Tkinter in Python Weight Conversion GUI using Tkinter in Python Age calculator using Tkinter Create a Digital Clock Using Tkinter Create a GUI Marksheet using Tkinter Simple GUI Calculator Using Tkinter Simple Registration form using Tkinter Weight Conversion GUI Using Tkinter Color Game Using Tkinter in Python File Explorer in Python using Tkinter Making of Notepad Using Tkinter Python Simple FLAMES Game Using Tkinter Python ToDo GUI Application Using Tkinter Python

Python PyQt5

Compound Interest GUI Calculator using PyQt5 in Python Create Table Using PyQt5 in Python Create the First GUI Application using PyQt5 in Python GUI Calendar using PyQt5 in Python Loan Calculator using PyQt5 in Python Rank Based Percentile GUI Calculator using PyQt5 in Python Simple GUI calculator using PyQt5 in Python Standard GUI Unit Converter using PyQt5 in Python

Misc

Introduction to Scratch programming SKLearn Clustering SKLearn Linear Module Standard Scaler in SKLearn Python Time Library SKLearn Model Selection Standard Scaler in SKLearn Accuracy_score Function in Sklearn Append key Value to Dictionary in Python Cross Entropy in Python Cursor in Python Data Class in Python Imread Python Parameter Passing in Python Program of Cumulative Sum in Python Python Program for Linear Search Python Program to Generate a Random String Read numpy array in Python Scrimba python Sklearn linear Model in Python Scraping data in python Accessing Key-value in Dictionary in Python Find Median of List in Python Linear Regression using Sklearn with Example Problem-solving with algorithm and data structures using Python Python 2.7 data structures Python Variable Scope with Local & Non-local Examples Arguments and parameters in Python Assertion error in python Programs for Printing Pyramid Patterns in Python _name_ in Python Amazon rekognition using python Anaconda python 3.7 download for windows 10 64-bit Android apps for coding in python Augmented reality in python Best app for python Not supported between instances of str and int in python Python comment symbol Python Complex Class Python IDE names Selection Sort Using Python Hypothesis Testing in Python Idle python download for Windows Insertion Sort using Python Merge Sort using Python Python - Binomial Distribution Python Logistic Regression with Sklearn & Scikit Python Random shuffle() method Python variance() function Removing the First Character from the String in Python Adding item to a python dictionary Best books for NLP with Python Best Database for Python Count Number of Keys in Dictionary Python Cross Validation in Sklearn Drop() Function in Python EDA in Python Excel Automation with Python Python Program to Find the gcd of Two Numbers Python Web Development projects Adding a key-value pair to dictionary in Python Python Euclidean Distance Python Filter List Python Fit Transform Python e-book free download Python email utils Python range() Function Python random.seed() function Python PPTX Python Pickle Python Seaborn Python Coroutine Python EOL Python Infinity Python math.cos and math.acos function Python Project Ideas Based On Django Reverse a String in Python Reverse a Number in Python Python Word Tokenizer Python Trigonometric Functions Python try catch exception GUI Calculator in Python Implementing geometric shapes into the game in python Installing Packages in Python Python Try Except Python Sending Email Socket Programming in Python Python CGI Programming Python Data Structures Python abstract class Python Compiler Python K-Means Clustering NSE Tools In Python Operator Module In Python Palindrome In Python Permutations in Python Pillow Python introduction and setup Python Functionalities of Pillow Module Python Argmin Python whois Python JSON Schema Python lock Return Statement In Python Reverse a sentence In Python tell() function in Python Why learn Python? Write Dictionary to CSV in Python Write a String in Python Binary Search Visualization using Pygame in Python Latest Project Ideas using Python 2022 Closest Pair of Points in Python ComboBox in Python Best resources to learn Numpy and Pandas in python Check Letter in a String Python Python Console Python Control Statements Convert Float to Int in Python using Pandas Importing Numpy in Pycharm Python Key Error Python NewLine Python tokens and character set Python Strong Number any() Keyword in python Best Database in Python Check whether dir is empty or not in python Comments in the Python Programming Language Convert int to Float in Python using Pandas Decision Tree Classification in Python End Parameter in python __GETITEM__ and __SETITEM__ in Python Python Namespace Python GUI Programming List Assignment Index out of Range in Python List Iteration in Python List Index out of Range Python for Loop List Subtract in Python Python Empty Tuple Python Escape Characters Sentence to python vector Slicing of a String in Python Executing Shell Commands in Python Genetic Algorithm in python Get index of element in array in python Looping through Data Frame in Python Syntax of Map function in Python Python AIOHTTP Alexa Python Artificial intelligence mini projects ideas in python Artificial intelligence mini projects with source code in Python Find whether the given stringnumber is palindrome or not First Unique Character in a String Python Python Network Programming Python Interface Python Multithreading Python Interpreter Data Distribution in python Flutter with tensor flow in python Front end in python Iterate a Dictionary in Python Iterate a Dictionary in Python – Part 2 Allocate a minimum number of pages in python Assertion Errors and Attribute Errors in Python Checking whether a String Contains a Set of Characters in python Python Control Flow Statements *Args and **Kwargs in Python Bar Plot in Python Conditional Expressions in Python Function annotations() in Python Image to Text in python import() Function in Python Import py file in Python Multiple Linear Regression using Python Nested Tuple in Python Python String Negative Indexing Reading a File Line by Line in Python Python Comment Block Base Case in Recursive function python ER diagram of the Bank Management System in python Image to NumPy Arrays in Python NOT IN operator in Python One Liner If-Else Statements in Python Sklearn in Python Cube Root in Python Python Variables, Constants and Literals Creating Web Application in python Notepad++ For Python PyPi TensorFlow Python | Read csv using pandas.read_csv() Run exec python from PHP Python coding platform Python Classification Python | a += b is not always a = a + b PyDev with Python IDE Character Set in Python Best Python AI Projects _dict_ in Python Python Ternary Operators Self in Python Python Modulo Python Packages Python Syntax Python Uses Python Bitwise Operators Python Identifiers Python Matrix Multiplication Python AND Operator Python Logical Operators Python Multiprocessing Python Unit Testing __init__ in Python Advantages of Python Python Boolean Python Call Function Python History Python Image Processing Python main() function Python Permutations and Combinations Conditional Statements in python Confusion Matrix Visualization Python Nested List in Python Python Algorithms Python Modules List Method Overloading in Python Python Arithmetic Operators Assignment Operators in Python Python Division Python exit commands Continue And Pass Statements In Python Colors In Python Convert String Into Int In Python Convert String To Binary In Python Convert Uppercase To Lowercase In Python Convert XML To JSON In Python Converting Set To List In Python Covariance In Python CSV Module In Python Decision Tree In Python Dynamic Typing In Python BOTTLE Python Web Framework Introducing modern python computing in simple packages Reason for Python So Popular Returning Multiple Values in Python Spotify API in Python Spyder (32-bit) - Free download Time. Sleep() in Python Traverse Dictionary in Python YOLO Python Nested for Loop in Python Data Structures and Algorithms Using Python | Part 1 Data Structures and Algorithms using Python | Part 2 ModuleNotFoundError No module named 'mysql' in Python N2 in Python XGBoost for Regression in Python Explain sklearn clustering in Python Data Drop in Python Falcon Python Flutter Python Google Python Class Excel to CSV in Python Google Chrome API in Python Gaussian elimination in python Matrix List Comprehension in Python Python List Size Python data science course StandardScaler in Sklearn Python Redis Example Python Program for Tower of Hanoi Python Printf Style Formating Python Percentage Sign Python Parse Text File Python Parallel Processing Python Online Compiler Python maketrans() function Python Loop through a Dictionary Python for Data Analysis Python for Loop Increment Python Kwargs Example Python Line Break Kite Python Length of Tuple in Python Python String Lowercase Python Struct Python Support Python String Variable Python System Command Python TCP Server Python Unit Test Cheat String Python Validator Unicode to String in Python An Introduction to Mocking in Python An Introduction to Subprocess in Python with Examples Anytree Python API Requests using Python App Config Python Check if the directory exists in Python Managing Multiple Python Versions With pyenv os.rename() method in Python os.stat() method in Python Python Ways to find nth occurrence of substring in a string Python Breakpoint Find Last Occurrence of Substring using Python Python Operators Python Selectors Python Slice from Last Occurrence of K Sentiment Analysis using NLTK String indices must be integers in Python Tensorflow Angular in Python AES CTR Python Crash Course on Python by Google Curdir Python Exrex Python FOO in Python Get Bounding Box Co-ordinates Python Hog Descriptor Opencv Python Io stringio Python iobase Python IPython Display Iterate through the list in Python Joint Plot in Python JWT Decode Python List Comprehension in Python List in Python Map Syntax in Python Python Marshmallow PyShark in Python Python Banner Python Logging Maxbytes Python Multiprocessing Processor Python Skyline Python Subprocess Call Example Python Sys Stdout Python Win32 Process Python's Qstandarditemmodel Struct Module in Python Sys Module in Python Tuple in Python Uint8 Python XXhash Python Examples XXhash Python Handling missing keys in Python dictionaries Python Num2words Python Os sep OSError in Python Periodogram in Python Pltpcolor in Python Poolmanager in Python Python pycountry Python pynmea2 Add a key-value pair to dictionary in Python Add Dictionary to Dictionary in Python Add Element to Tuple in Python Add in Dictionary Python Arithmetic Expressions in Python Array to String in Python AX Contour in Python Best Way to Learn Python for Free Captcha Code in Python with Example CatPlot in Python Change Data Type in Python Check if a String is Empty in Python Algorithm for Factorial of a number in Python chr() and ord() Functions in Python Class and Static Methods in Python Convert List to Array in Python Copying a file from one folder to Another with Python Cx_Oracle Python with Example Enumerate() Function in Python Event Key in Python Exclusive OR in Python Exponentiation in Python Expressions in Python Filter List in Python Find key from value in dictionary python Find Words in String Python First unique character in a string Python Fsolve in Python GET and POST requests using Python Gethostbyname() function in Python Comment starts with the symbol in Python Isodate Python Isreal() Python Mrcnn Python OS Module in Python Paramiko Python Example Python BytesIO Python Deep Copy and Shallow Copy Python Glob Python Memory Management Python Operator Precedence Python Parser Python Project Ideas Python sklearn train_test_split Python SymPy Python Syntax Error Invalid Syntax Python Tricks: The Book Sort a dataframe based on a column in Python Spark and Python for big data with pyspark github Standard Scalar in Python STL in Python Sublime Python Sum of Prime Numbers in Python XML parsing in Python The ZEN of Python THONNY IDE Tic Tac Toe Python Tweepy Python Types of Functions in Python Virtualenv Python3 Python Visual Studio Z Pattern in Python Aggregation data across one or more columns Boolean Literals in Python Find() Function in Python Heap Sort in Python .iloc function in Python Integers and floating-point numbers in Python Label and Integer-based Slicing Technique in Python ML Cancer Cell Classification Using Scikit-learn Mutable and Immutable Data Types in Python PEMDAS in Python Precedence Order In Python Product of Two Numbers in Python Python Built-in Exceptions Python Decorator Wraps Python dedent Python deep copy object Python deep learning library Python def Python duplicate list Python Hashlib Python Introduction Notes Python is Easy to Learn or Not Python list files in Directory Python Long Int Python Loop Questions Python Program to find the Largest Number in a List Python Programming Practice Questions Python Raise an Exception Reduce Function in Python Reverse a Tuple in Python Reverse of Array in Python Sieve of Eratosthenes in Python Special Literals in Python String to Int in Python 3 Temperature Conversion Program in Python Two Conditions in If Statement Python Types of Knapsack Problems in Python XOR Function in Python Best First Search Program in Python Binary to Decimal Conversion in Python Environment Variables in Python Escape Sequence Characters in Python Export WhatsApp Chat History to Excel using Python Fizzbuzz Program in Python Float in Python fromtimestamp() function of Datetime.date class in Python Get Random Dog Images in Python Handling EOFError Exception in Python Python Delete Folder and Contents Python Expanduser Python gdb Breakpoint Remove special characters from string in Python DRF Serializer in Python Logistic Regression in Python Python del multiple variables Python Delete a Directory with Files Python Delete Files from Directory Python Delete Key Types of Functions in Python Understanding Variable Length Arguments in Python XOR in Python

Decision Tree Classification in Python

In this article, we will learn the implementation of the decision tree in Sklearn, which is nothing but the Scikit Learn library of python. First, we should learn what classification is.

What is Classification?

Classification is dividing the datasets into different categories or groups by adding the label. In another way, we can say that it is the technique of categorizing the observation into a different category.

We are taking the data, analyzing it, and based on some conditions, dividing it into various categories.

Why do We Classify it?

We classify it to perform productive analysis on it. Like when we get the mail, the Machine predicts whether it is said to be spam or not spam mail, and based on that prediction, it adds the irrelevant or spam mail to the respective folder.

In general, the classification algorithm handled questions like, is this data belong to the A category or B category? Is this a male, or is this female?

We will use this protection to check whether the transaction is genuine or not. Suppose we use a credit card in India because we had to fly to America; now, we will get a notification alert regarding my transaction if we use the credit card over there.

They would ask me to confirm the transaction. So, there is also a kind of predictive analysis as the Machine predicts something fishy in the transaction. Before 36 hours, we had done a transaction using the same credit card in India, and 24 hours later, the same credit card I used for the payment in America. To confirm it, it sends us a notification alert.

This is one of the use cases of classification. We can even use it to classify different items like fruits based on their color, size, and taste.

A well-trained machine using the classification algorithm can easily predict the class or type of fruit. Whenever new data is given to it, not just a fruit, it can be any item, it can be a car, or it can be a house, it can be a signboard or anything.

Types of Classification

There are several ways to perform the same task. A machine had to be trained first to predict whether a given person is a male or female. There are multiple ways to train the Machine, and you can choose one. There are many distinguishable techniques for predictive analysis, but the decision tree is the most call of them all is the decision tree.

As a part of the classification algorithm, we have:

  1. Decision tree
  2. Random forest
  3. Naive Bayes
  4. KNN algorithm
  5. Logistic regression
  6. Linear regression

Now, we will discuss the Decision Tree

Decision Tree

The decision tree is a graphical representation of all the possible solutions to a decision. The decisions which are made can be explained very easily. For example, here is a task which says should I go to a restaurant or should I buy Pasta. I am confused about that, so I will create a decision tree for it starting with the root node. Will first, we want to check whether I am hungry or not. If I am not hungry, then I go back to sleep. If I am hungry and I have 1000 rupees, I will decide to go to the restaurant, and If I am hungry and do not have 1000 rupees, then I will go and buy Pasta. So, this is about the decision tree.

What is a decision tree?

This decision tree is very easy to read and understand. It belongs to one of the few interpretable models where you can understand exactly why the classifier made that decision. For a given dataset, you cannot say that this algorithm performs better than that. It is like you cannot say that a decision tree is better than a bias or Naive bias performs better than a decision tree. It only depends on the dataset. We have to apply the hit and trial method with all the algorithms and then compare the result. The model that gives the best result is the model we can use for better accuracy for our data set.

" A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions."

Decision Tree Algorithm

As we discussed, the decision tree definition and we might wonder why this is called a decision tree, because it starts with a root and then branches off to several solutions, just like a tree, even the tree starts from a root. It starts growing its branches once it gets bigger and bigger. Similarly, a decision tree has a root that keeps on growing with increasing decisions and conditions.

We would now check with a real-life scenario. Whenever I dial the toll-free number of my company, it redirects to the intelligent computerized assistant like press 1 for English or press 2 for Hindi, press 3 for Telugu and so on. Suppose now you select 1; it again redirects to a certain set of questions like as same above. So, this keeps on repeating until you get to the right person.

Consider an example of a decision tree that has created a decision tree: "Should I accept a new job offer?" For this, we have a root node named a "salary of at least 50,000 rupees." If it is YES, the commute is more than 1 hour or not; if it is YES, I will decline the offer. I will accept the job offer if it is less than 1 hour. Further, I will check whether the company is offering free coffee; if the company is not offering free coffee, I will decline the offer, and if it offers free coffee, I will happily accept it. So, this is the only example of a decision tree.

Now, let us consider a sample dataset we will use to explain the decision tree. Each row in this data set is an example; the first two columns provide features or attributes that describe the data, and the last column gives the label or the class we want to predict. And if we want to modify this data by adding additional features and examples, our program will work the same way. It is straightforward except for one thing: it is not perfectly separable. In the second and fifth examples, they have the same features but different labels; both have yellow as their color and a diameter of three, but the labels are mango and lemon.

Let us see how a decision tree handles this case. To build a tree will use a decision tree algorithm called "Carter". So, this carter algorithm stands for classification and regression tree algorithm.

Terminology of a Decision Tree

So let us start with the root node. The root node is a base node of a tree; the entire tree starts from a root node. In other words, it is the first node of a tree, representing the entire population or samples of this population, that is further segregated or divided into two or more homogeneous sets.

Next, we will discuss the leaf node. The leaf node is the one when you reach the end of the tree. That is when we cannot further segregate it to another level.

Now we will discuss splitting. It divides our root node or node into different sub-parts based on some condition.

Now comes the branch or sub-tree. This branch or sub-tree formed when we split the tree. Suppose when we split a root node, it gets divided into two branches or two sub-trees.

Next is the concept of Pruning. It is just the opposite of splitting, and we are just removing the sub-node of a decision tree.

Next is the parent or child node. As we know that root node is always a parent node, and all other associated nodes are known as child nodes. All the top nodes belong to a parent node all the bottom nodes derived from a top node are a child node. The node producing a different node is a child node, and the node producing it is a parent.

We will discuss further the decision tree.

A decision tree is a type of supervised learning algorithm that is used for both classification and regression problems. This algorithm will use training data to create rules representing a tree structure. Like a tree, it consists of the root, internal, and leaf nodes.

How does a Tree Decide Where to Split?

Before we split a tree, we should know some terminologies. The first is the GINI INDEX. So, Gini Index is the measure of Impurity (or purity) used in building the decision tree in CART ALGORITHM.

Information Gain

This information gain is the decrease in entropy after a dataset is split based on an attribute. Constructing a decision tree involves finding attributes that return the highest information gain. So, we will be selecting the node that would give us the highest information gain.

Reduction in Variance

Reducing variance is an algorithm for continuous target variables (regression problems). The split with lower variance is selected as the criteria to split the population.

Variance is how much our data varies. So, if our data is less impure or is purer, then, in that case, the variation would be less as all the data is almost similar. It is also a way of splitting a tree, and then spread with lower variance is selected as a criterion to split the population.

Chi-Square

It is an algorithm to determine the statistical significance of the differences between sub-nodes and the parent node.

First, we know how to decide on the best attribute. For this, we need to calculate information to gain the attribute with the highest information gain is considered the best. To define the correct definition for information gain, we would first know about the term "Entropy". In this definition, it helps to calculate the information gained.

Entropy

Entropy is just a metric that measures the Impurity of something; in other words, we can say that it is the first step before you solve the problem for the decision tree.

Let us now understand what the term is called Impurity.

Suppose we have a basket full of apples and another bowl full of some label that says Apple. Now, if you are asked to pick one item from each basket and ball, then the probability of getting the apple and the correct label is 1, so in this case, we can say that Impurity is '0'.

Now, if there are four different fruits in the basket and 4 different labels on the ball, then the probability of matching the fruit to the label is not 1. It is something less than that. It could be possible that when I pick a banana from the basket and randomly pick the label from the ball, it says a cherry; any random permutation and the combination can be possible. So, in this case, I would say that impurities are non-zero.

Entropy

It is the measure of Impurity. It is just a metric that measures the Impurity to the first step to solving a decision tree problem.From the graph given below, we can say that as the probability is '0.' or '1', that is either they are highly impure or it is highly pure; then, in that case, the value of entropy is '0', and when the probability is 0.5 then the value of entropy is maximum. Impurity is the degree of randomness; that is, how random a data is?

We have a mathematical formula for entropy that is shown below:

Entropy(s) = -P(Yes) log2 P(Yes) - P(no) log2 P(no)

Where we derive that,

S is the total sample space

P(Yes) is the probability of YES

If number of YES = number of NO i.e., P(S) = 0.5

1. Entropy(s) = 1

If it contains all YES or all NO i.e., P(S) = 1 or 0

2. (s) = 0

When P(Yes) = P(No) = 0.5 i.e., YES+NO = Total Sample(S)

Here, we have an equal number of Yes equal to NO's probability.

When we put the values in the 1st formula, we get

E(S) = 0.5 log2 0.5 - 0.5 log2 0.5

E(S) = 0.5(log2 0.5 - log2 0.5)

E(S) = 1

From above we get the total sample space as "ONE".

In the second case, we would consider

E(S) = -P(Yes) log2 P(Yes)

When P(Yes) = 1 i.e., YES = Total Sample(S)

E(S) = 1 log1 base 2

Above, the value of log 1 = 0.

Similarly in the case NO we get the entropy of total sample space is "0". Below shows you an example.

E(S) = -P(No) log2 P(No)

when P(No) = 1 i.e., NO= Total Sample(S)

E(S) = 1 log2 1

E(S) = 0

Here entropy of total sample space is "0".

What is Information Gain?

It measures the reduction in entropy and decides which attribute should be selected as the decision node.

If "S" is our total collection,

Information Gain = Entropy(S) - [(Weighted Avg) * Entropy (each feature)]

Let us build our own decision tree.

So, let us manually build a decision tree for our data set, consisting of 14 instances, of which we have 9 YES and 5 NO. So, we have a formula for entropy.

We have 9, YES, and so the total probability of YES is 9/14

The total probability of NO is 5/14.

When we calculate the value of entropy, we get,

E(S) = -P(Yes)log2 P(Yes)-P(No) log2 P(No)

E(S) = -(9/14) *log2 9/14-(5/14) * log2 5/14

E(S) =0.41+0.53 --> 0.94

So, the value of entropy as 0.94.

Out of those 5 nodes, we will get confused about which node is the root node. Based on this, we would be creating the entire tree.

We must calculate the entropy, and Information gained for all the nodes.

Starting with "Outlook" has three different parameters that are sunny, overcast and rainy. Firstly, check how many YES and NO are there.

Then, we will calculate the entropy for each feature. Here we are calculating for "outlook".

E (Outlook = Sunny) = -2/5 log2 2/5

E (Outlook = Overcast) = -1 log2 1 - 0 log 2 0 = 0

E (Outlook = Sunny) = -3/5 log2 3/5 - 2/5 log2 2/5 =0.971

Information from outlook,

I(Outlook) = 5/14*0.971 + 4/14*0+5/14*0.971=0.693

Information gained from outlook is

Gain (Outlook) = E(S)-I(Outlook)

0.94-0.693=0.247

Like this, we can calculate each feature from the above table.

Importing Libraries

We should import the libraries that are initially required, such as NumPy, pandas, seaborn, and matplotlib. pyplot.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%Matplotlib inline

Importing the dataset

Here we will import the dataset from a CSV file to the pandas' data frames.

col = [ 'Class Name', 'Left weight', 'Left distance', 'Right weight', 'Right distance']
df = pd.read_csv('balance-scale.data', names=col, sep=',')
df.head ()  


Decision Tree Classification in Python

When we want dataset information, we must use a function called "df.info". From the above output, we can see that it already has 625 records and 5 fields.

df.info ()

Output

Decision Tree Classification in Python

Plotting a Decision Tree

We can also plot the decision tree as a tree structure by using a library called Graphviz library. Then, we can pass several parameters such as the classifier model, target values, and our data's features name.

target = list(df['Class Name']. unique ())
feature names = list (Columns)

And also, we should write another program for plotting a decision tree.

from sklearn import tree
import graphviz
dot_data = tree.export_graphviz(clf_model,
out_file=None, 
feature_names=feature_names,  
class_names=target,  
                      filled=True, rounded=True,  
special_characters=True)  
graph = graphviz.Source(dot_data)  
graph


Output

Decision Tree Classification in Python

Also, we can get the tree by textual representation by using the export_tree function from the sklearn library.

When to Stop Splitting?

  1. The stopping condition is met
  2. Possible to split until 1 leaf for each observation (100% accuracy)
  3. Overfitting problem
  4. Set constraints on tree size
  5. Tree pruning

Advantages of Decision Tree

  1. Easy to Understand: Decision tree output is very easy to understand, even for people from a non-analytical background.
  2. Useful in Data Exploration: A decision tree is one of the fastest ways to identify the most significant variables and the relation between two or more variables.
  3. Handle Outliers: It is not influenced by outliers and missing values to a fair degree.
  4. The data type is not a constraint: It can handle both numerical and categorical variables.
  5. Non-Parametric Method: The decision tree is considered to be a non-parametric method. Decision trees have no assumptions about space distribution and the classifier structure.

Disadvantages of Decision Tree

  1. Over Fitting: Overfitting is one of the most practical difficulties for decision tree models. This problem gets solved by setting constraints on model parameters and pruning.
  2. Limitation for Continuous Variables: While working with continuous numerical variables, the decision tree losses information when it discretizes variables in different categories.

Some points on the decision tree

  • A decision tree is very simple to understand and interpret.
  • Incorporates, academics, social events etc., plays a very important role.
  • With other decision techniques can also be used.
  • Addition can also do in the decision tree with a new scenario.
  • Information or data is needed and can include expert opinions and preferences. 
  • Normalization of data does not require in the decision tree.
  • Decision tree causes instability if there are any data or information changes.
  • Calculations become difficult; particularly, it means values are uncertain and if many outcomes are linked.