Python Tutorial

Introduction Python Features Python Applications System requirements for Python Python Installation Python Basics Python Variables Python Data Types Python IDE Python Keywords Python Operators Python Comments Python Pass Statement

Python Conditional Statements

Python if Statement Python elif Statement Python If-else statement Python Switch Case

Python Loops

Python for loop Python while loop Python Break Statement Python Continue Statement Python Goto Statement

Python Arrays

Python Array Python Matrix

Python Strings

Python Strings Python Regex

Python Built-in Data Structure

Python Lists Python Tuples Python Lists vs Tuples Python Dictionary Python Sets

Python Functions

Python Function Python min() function Python max() function Python User-define Functions Python Built-in Functions Anonymous/Lambda Function in Python

Python File Handling

Python File Handling Python Read CSV Python Write CSV Python Read Excel Python Write Excel Python Read Text File Python Write Text File Read JSON File in Python

Python Exception Handling

Python Exception Handling Python Errors and exceptions Python Assert

Python OOPs Concept

OOPs Concepts in Python Classes & Objects in Python Inheritance in Python Polymorphism in Python Python Encapsulation Python Constructor Static Variables in Python Abstraction in Python

Python Iterators

Iterators in Python Yield Statement In Python

Python Generators

Python Generator

Python Decorators

Python Decorator

Python Functions and Methods

Python Built-in Functions Python String Methods Python List Methods Python Dictionary Methods Python Tuple Methods Python Set Methods

Python Modules

Python Modules Python Datetime Module Python Calendar Module  

Python MySQL

Python MySQL Python MySQL Update Operation Python MySQL Delete Operation

Python MongoDB

Python MongoDB

Python Data Structure Implementation

Python Stack Python Queue Python Hash Table Python Graph

Python Advance Topics

Speech Recognition in Python Face Recognition in Python Python Rest API Python Command Line Arguments Python JSON Python Virtual Environment Type Casting in Python Collections in python Python Enumerate Python Debugger Python DefaultDict

Misc

Python PPTX Python Pickle Python Seaborn Python Coroutine Python EOL Python Infinity Python math.cos and math.acos function Python Project Ideas Based On Django Reverse a String in Python Reverse a Number in Python Python Word Tokenizer Python Trigonometric Functions Python try catch exception GUI Calculator in Python Implementing geometric shapes into the game in python Installing Packages in Python Python Try Except Python Sending Email Socket Programming in Python Python CGI Programming Python Data Structures Python abstract class Python Compiler Python K-Means Clustering List Comprehension in Python3 NSE Tools In Python Operator Module In Python Palindrome In Python Permutations in Python Pillow Python introduction and setup Python Functionalities of Pillow Module Python Argmin Python whois Python JSON Schema Python lock Return Statement In Python Reverse a sentence In Python tell() function in Python Why learn Python? Write Dictionary to CSV in Python Write a String in Python Binary Search Visualization using Pygame in Python Latest Project Ideas using Python 2022 Closest Pair of Points in Python ComboBox in Python Python vs R Python Ternary Operators Self in Python Python vs Java Python Modulo Python Packages Python Syntax Python Uses Python Logical Operators Python Multiprocessing Python History Difference between Input() and raw_input() functions in Python Conditional Statements in python Confusion Matrix Visualization Python Python Algorithms Python Modules List Difference between Python 2 and Python 3 Is Python Case Sensitive Method Overloading in Python Python Arithmetic Operators Design patterns in python Assignment Operators in Python Is Python Object Oriented Programming language Division in Python Python exit commands Continue And Pass Statements In Python Colors In Python Convert String Into Int In Python Convert String To Binary In Python Convert Uppercase To Lowercase In Python Convert XML To JSON In Python Converting Set To List In Python Covariance In Python CSV Module In Python Decision Tree In Python Difference Between Yield And Return In Python Dynamic Typing In Python Abstract design pattern in python Builder design pattern in python Prototype design pattern in Python Creational design patterns in Python

How to

How to convert integer to float in Python How to reverse a string in Python How to take input in Python How to install Python in Windows How to install Python in Ubuntu How to install PIP in Python How to call a function in Python How to download Python How to comment multiple lines in Python How to create a file in Python How to create a list in Python How to declare array in Python How to clear screen in Python How to convert string to list in Python How to take multiple inputs in Python How to write a program in Python How to compare two strings in Python How to create a dictionary in Python How to create an array in Python How to update Python How to compare two lists in Python How to concatenate two strings in Python How to print pattern in Python How to check data type in python How to slice a list in python How to implement classifiers in Python How To Print Colored Text in Python How to develop a game in python How to print in same line in python How to create a class in python How to find square root in python How to import numy in python How to import pandas in python How to uninstall python How to upgrade PIP in python How to append a string in python How to open a file in python

Sorting

Python Sort List Sort Dictionary in Python Python sort() function Python Bubble Sort

Programs

Factorial Program in Python Prime Number Program in Python Fibonacci Series Program in Python Leap Year Program in Python Palindrome Program in Python Check Palindrome In Python Calculator Program in Python Armstrong Number Program in Python Python Program to add two numbers Anagram Program in Python Even Odd Program in Python GCD Program in Python Python Exit Program Python Program to check Leap Year Operator Overloading in Python Pointers in Python Python Not Equal Operator Raise Exception in Python Salary of Python Developers in India What is a Script in Python Singleton design pattern in python

How to implement classifiers in Python

What is classification?

Classification is a type of supervised machine learning problem with a categorical target (response) variable. Given the known label in the training data, the classifier approximates a mapping function (f) from the input variables (X) to the output variables (Y) (Y).

Import Libraries and Load Dataset

To begin, we must import the following libraries: pandas (for dataset loading), numpy (for matrix manipulation), matplotlib and seaborn (for visualisation), and sklearn (for machine learning) (building classifiers). Before importing them, make sure they are already installed. To import the libraries and load the dataset, use the following code:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from pandas.plotting import parallel_coordinates
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

To load the dataset, we can use the read_csv function from pandas (my code also includes the option of loading through URL).

data = pd.read_csv('data.csv')

After we load the data, we can take a look at the first couple of rows through the head function:

data.head(5)

Train-Test Split

We can now divide the dataset into two parts: training and testing. In general, we should have a validation set that is used to evaluate the performance of each classifier and fine-tune the model parameters in order to find the best model. The test set is primarily used for reporting. However, because this dataset is small, we can simplify the process by using the test set to serve the purpose of the validation set.

In addition, to estimate model accuracy, I used a stratified hold-out approach. Cross-validation is another method for reducing bias and variances.

train, test = train_test_split(data, test_size = 0.4, stratify = data[‘species’], random_state = 42)

Exploratory Data Analysis

We can now proceed to explore the training data after we have split the dataset. Matplotlib and Seaborn both have excellent plotting tools that we can use for visualisation.

Let's start by making some univariate plots with a histogram for each feature:

n_bins = 10
fig, axs = plt.subplots(2, 2)
axs[0,0].hist(train['sepal_length'], bins = n_bins);
axs[0,0].set_title('Sepal Length');
axs[0,1].hist(train['sepal_width'], bins = n_bins);
axs[0,1].set_title('Sepal Width');
axs[1,0].hist(train['petal_length'], bins = n_bins);
axs[1,0].set_title('Petal Length');
axs[1,1].hist(train['petal_width'], bins = n_bins);
axs[1,1].set_title('Petal Width');
# add some spacing between subplots
fig.tight_layout(pad=1.0);

It's worth noting that for both petal length and petal width, there appears to be a group of data points with lower values than the others, implying that there could be different groups in this data.

Let's try some side-by-side box plots next:

fig, axs = plt.subplots(2, 2)
fn = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
cn = ['setosa', 'versicolor', 'virginica']
sns.boxplot(x = 'species', y = 'sepal_length', data = train, order = cn, ax = axs[0,0]);
sns.boxplot(x = 'species', y = 'sepal_width', data = train, order = cn, ax = axs[0,1]);
sns.boxplot(x = 'species', y = 'petal_length', data = train, order = cn, ax = axs[1,0]);
sns.boxplot(x = 'species', y = 'petal_width', data = train,  order = cn, ax = axs[1,1]);
# add some spacing between subplots
fig.tight_layout(pad=1.0);

The two plots at the bottom imply that the setosas we saw earlier are setosas. Their petal measurements are smaller and more evenly distributed than those of the other two species. When compared to the other two species, versicolor has lower average values than virginica.

Another type of visualisation is the violin plot, which combines the advantages of both the histogram and the box plot:

sns.violinplot(x="species", y="petal_length", data=train, size=5, order = cn, palette = 'colorblind');

Gaussian Naive Bayes Classifier

Naive Bayes is a popular classification model. It contains the word "Naive" because it contains a key assumption of class-conditional independence, which means that given the class, each feature's value is assumed to be independent of any other feature's value (read more here).

We know that this is not the case here, as evidenced by the high correlation between the petal features. Let's look at the test accuracy using this model to see if this assumption is sound:

The accuracy of the Guassian Naive Bayes Classifier on test data is 0.933

What about the result if we only use the petal features:

The accuracy of the Guassian Naive Bayes Classifier with 2 predictors on test data is 0.950

Interestingly, using only two features results in more correctly classified points, implying that using all features may result in over-fitting. Our Naive Bayes classifier appears to have done a good job.

Linear Discriminant Analysis (LDA)

If we calculate the class conditional density using a multivariate Gaussian distribution rather than a product of univariate Gaussian distributions (as in Naive Bayes), we get an LDA model (read more here). The key assumption of LDA is that covariances between classes are equal. We can examine the test accuracy using both all and only petal features:

The accuracy of the LDA Classifier on test data is 0.983
The accuracy of the LDA Classifier with two predictors on test data is 0.933

The use of all features improves the test accuracy of our LDA model.

We can use our LDA model with only petals and plot the test data to visualise the decision boundary in 2D:

Three virginica and one versicolor test points are misclassified.



ADVERTISEMENT
ADVERTISEMENT