Python Tutorial

Introduction Python Features Python Applications Python System requirements Python Installation Python Examples Python Basics Python Indentation Python Variables Python Data Types Python IDE Python Keywords Python Operators Python Comments Python Pass Statement

Python Conditional Statements

Python if Statement Python elif Statement Python If-else statement Python Switch Case

Python Loops

Python for loop Python while loop Python Break Statement Python Continue Statement Python Goto Statement

Python Arrays

Python Array Python Matrix

Python Strings

Python Strings Python Regex

Python Built-in Data Structure

Python Lists Python Tuples Python Lists vs Tuples Python Dictionary Python Sets

Python Functions

Python Function Python min() function Python max() function Python User-define Functions Python Built-in Functions Python Recursion Anonymous/Lambda Function in Python apply() function in python Python lambda() Function

Python File Handling

Python File Handling Python Read CSV Python Write CSV Python Read Excel Python Write Excel Python Read Text File Python Write Text File Read JSON File in Python

Python Exception Handling

Python Exception Handling Python Errors and exceptions Python Assert

Python OOPs Concept

OOPs Concepts in Python Classes & Objects in Python Inheritance in Python Polymorphism in Python Python Encapsulation Python Constructor Python Super function Python Static Method Static Variables in Python Abstraction in Python

Python Iterators

Iterators in Python Yield Statement In Python Python Yield vs Return

Python Generators

Python Generator

Python Decorators

Python Decorator

Python Functions and Methods

Python Built-in Functions Python String Methods Python List Methods Python Dictionary Methods Python Tuple Methods Python Set Methods

Python Modules

Python Modules Python Datetime Module Python Math Module Python Import Module Python Time ModulePython Random Module Python Calendar Module CSV Module in Python Python Subprocess Module

Python MySQL

Python MySQL Python MySQL Client Update Operation Delete Operation Database Connection Creating new Database using Python MySQL Creating Tables Performing Transactions

Python MongoDB

Python MongoDB

Python SQLite

Python SQLite

Python Data Structure Implementation

Python Stack Python Queue Python Linked List Python Hash Table Python Graph

Python Advance Topics

Speech Recognition in Python Face Recognition in Python Python Linear regression Python Rest API Python Command Line Arguments Python JSON Python Subprocess Python Virtual Environment Type Casting in Python Python Collections Python Attributes Python Commands Python Data Visualization Python Debugger Python DefaultDict Python Enumerate

Python 2

What is Python 2

Python 3

Anaconda in Python 3 Anaconda python 3 installation for windows 10 List Comprehension in Python3

How to

How to Parse JSON in Python How to Pass a list as an Argument in Python How to Install Numpy in PyCharm How to set up a proxy using selenium in python How to create a login page in python How to make API calls in Python How to run Python code from the command prompt How to read data from com port in python How to Read html page in python How to Substring a String in Python How to Iterate through a Dictionary in Python How to convert integer to float in Python How to reverse a string in Python How to take input in Python How to install Python in Windows How to install Python in Ubuntu How to install PIP in Python How to call a function in Python How to download Python How to comment multiple lines in Python How to create a file in Python How to create a list in Python How to declare array in Python How to clear screen in Python How to convert string to list in Python How to take multiple inputs in Python How to write a program in Python How to compare two strings in Python How to create a dictionary in Python How to create an array in Python How to update Python How to compare two lists in Python How to concatenate two strings in Python How to print pattern in Python How to check data type in python How to slice a list in python How to implement classifiers in Python How To Print Colored Text in Python How to open a file in python How to Open a file in python with Path How to run a Python file in CMD How to change the names of Columns in Python How to Concat two Dataframes in Python How to Iterate a List in Python How to learn python Online How to Make an App with Python How to develop a game in python How to print in same line in python How to create a class in python How to find square root in python How to import numy in python How to import pandas in python How to uninstall python How to upgrade PIP in python How to append a string in python How to comment out a block of code in Python How to change a value of a tuple in Python How to append an Array in Python How to Configure Python Interpreter in Eclipse Parameter Passing in Python How to plot a Histogram in Python How to Import Files in Python How to Download all Modules in Python How to get Time in seconds in Python How to Practice Python Programming How to plot multiple linear regression in Python How to set font for Text in Python

Sorting

Python Sort List Sort Dictionary in Python Python sort() function Python Bubble Sort

Programs

Factorial Program in Python Prime Number Program in Python Fibonacci Series Program in Python Leap Year Program in Python Palindrome Program in Python Check Palindrome In Python Calculator Program in Python Armstrong Number Program in Python Python Program to add two numbers Anagram Program in Python Number Pattern Programs in Python Even Odd Program in Python GCD Program in Python Python Exit Program Python Program to check Leap Year Operator Overloading in Python Pointers in Python Python Not Equal Operator Raise Exception in Python Salary of Python Developers in India What is a Script in Python

Misc

Introduction to Scratch programming SKLearn Clustering SKLearn Linear Module Standard Scaler in SKLearn Python Time Library SKLearn Model Selection Standard Scaler in SKLearn Accuracy_score Function in Sklearn Append key Value to Dictionary in Python Cross Entropy in Python Cursor in Python Data Class in Python How to Install Tweepy in Python Imread Python Program of Cumulative Sum in Python Python Program for Linear Search Python Program to Generate a Random String Read numpy array in Python Scrimba python Sklearn linear Model in Python Scraping data in python Accessing Key-value in Dictionary in Python Find Median of List in Python Linear Regression using Sklearn with Example Problem-solving with algorithm and data structures using Python Python 2.7 data structures Python Variable Scope with Local & Non-local Examples Arguments and parameters in Python Assertion error in python Programs for Printing Pyramid Patterns in Python _name_ in Python Amazon rekognition using python Anaconda python 3.7 download for windows 10 64-bit Android apps for coding in python Augmented reality in python Best app for python Difference between Perl and Python Not supported between instances of str and int in python Python comment symbol Python Complex Class Python IDE names Selection Sort Using Python Hypothesis Testing in Python Idle python download for Windows Insertion Sort using Python Merge Sort using Python Python - Binomial Distribution Python Logistic Regression with Sklearn & Scikit Python Random shuffle() method Python variance() function Python vs HTML Removing the First Character from the String in Python Adding item to a python dictionary Best books for NLP with Python Best Database for Python Count Number of Keys in Dictionary Python Cross Validation in Sklearn Drop() Function in Python EDA in Python Excel Automation with Python Python Program to Find the gcd of Two Numbers Python Web Development projects Adding a key-value pair to dictionary in Python Python Euclidean Distance Python Filter List Python Fit Transform Python e-book free download Python email utils Python range() Function Python random.seed() function What is the re.sub() function in Python Python PPTX Python Pickle Python Seaborn Python Coroutine Python EOL Python Infinity Python math.cos and math.acos function Python Project Ideas Based On Django Reverse a String in Python Reverse a Number in Python Python Word Tokenizer Python Trigonometric Functions Python try catch exception GUI Calculator in Python Implementing geometric shapes into the game in python Installing Packages in Python Python Try Except Python Sending Email Socket Programming in Python Python CGI Programming Python Data Structures Python abstract class Python Compiler Python K-Means Clustering NSE Tools In Python Operator Module In Python Palindrome In Python Permutations in Python Pillow Python introduction and setup Python Functionalities of Pillow Module Python Argmin Python whois Python JSON Schema Python lock Return Statement In Python Reverse a sentence In Python tell() function in Python Why learn Python? Write Dictionary to CSV in Python Write a String in Python Binary Search Visualization using Pygame in Python Latest Project Ideas using Python 2022 Closest Pair of Points in Python ComboBox in Python Python vs R Best resources to learn Numpy and Pandas in python Check Letter in a String Python Python Console Python Control Statements Convert Float to Int in Python using Pandas Difference between python list and tuple Importing Numpy in Pycharm Python Key Error Python NewLine Python tokens and character set Python Strong Number any() Keyword in python Best Database in Python Check whether dir is empty or not in python Comments in the Python Programming Language Convert int to Float in Python using Pandas Decision Tree Classification in Python End Parameter in python __GETITEM__ and __SETITEM__ in Python Python Namespace Python GUI Programming List Assignment Index out of Range in Python List Iteration in Python List Index out of Range Python for Loop List Subtract in Python Python Empty Tuple Python Escape Characters Sentence to python vector Slicing of a String in Python Executing Shell Commands in Python Genetic Algorithm in python Get index of element in array in python Looping through Data Frame in Python Syntax of Map function in Python After Python What Should I Learn Python AIOHTTP Alexa Python Artificial intelligence mini projects ideas in python Artificial intelligence mini projects with source code in Python Find whether the given stringnumber is palindrome or not First Unique Character in a String Python Python Network Programming Python Interface Python Multithreading Python Interpreter Data Distribution in python Flutter with tensor flow in python Front end in python Iterate a Dictionary in Python Iterate a Dictionary in Python – Part 2 Allocate a minimum number of pages in python Assertion Errors and Attribute Errors in Python Checking whether a String Contains a Set of Characters in python Python Control Flow Statements *Args and **Kwargs in Python Bar Plot in Python Conditional Expressions in Python Function annotations() in Python How to Write a Configuration file in Python Image to Text in python import() Function in Python Import py file in Python Multiple Linear Regression using Python Nested Tuple in Python Python String Negative Indexing Reading a File Line by Line in Python Python Comment Block Base Case in Recursive function python ER diagram of the Bank Management System in python Image to NumPy Arrays in Python NOT IN operator in Python One Liner If-Else Statements in Python Sklearn in Python Cube Root in Python Python Variables, Constants and Literals What Does the Percent Sign (%) Mean in Python Creating Web Application in python Notepad++ For Python PyPi TensorFlow Python | Read csv using pandas.read_csv() What is online python free IDE What is Python online compiler Run exec python from PHP What are the Purposes of Python What is Python compiler GDB Python coding platform Python Classification Python | a += b is not always a = a + b PyDev with Python IDE Character Set in Python Best Python AI Projects _dict_ in Python Python Ternary Operators Self in Python Python vs Java Python Modulo Python Packages Python Syntax Python Uses Python Bitwise Operators Python Identifiers Python Matrix Multiplication Python AND Operator Python Logical Operators Python Multiprocessing Python Unit Testing __init__ in Python Advantages of Python Is Python Case-sensitive when Dealing with Identifiers Python Boolean Python Call Function Python History Python Image Processing Python main() function Python Permutations and Combinations Difference between Input() and raw_input() functions in Python Conditional Statements in python Confusion Matrix Visualization Python Nested List in Python Python Algorithms Python Modules List Difference between Python 2 and Python 3 Is Python Case Sensitive Method Overloading in Python Python Arithmetic Operators Assignment Operators in Python Is Python Object Oriented Programming language Python Division Python exit commands Continue And Pass Statements In Python Colors In Python Convert String Into Int In Python Convert String To Binary In Python Convert Uppercase To Lowercase In Python Convert XML To JSON In Python Converting Set To List In Python Covariance In Python CSV Module In Python Decision Tree In Python Difference Between Yield And Return In Python Dynamic Typing In Python BOTTLE Python Web Framework How to Install Scikit-Learn Introducing modern python computing in simple packages Python vs PHP Reason for Python So Popular Returning Multiple Values in Python Spotify API in Python Spyder (32-bit) - Free download Time. Sleep() in Python Traverse Dictionary in Python What is Ipython shell YOLO Python Nested for Loop in Python Data Structures and Algorithms Using Python | Part 1 Data Structures and Algorithms using Python | Part 2 ModuleNotFoundError No module named 'mysql' in Python N2 in Python XGBoost for Regression in Python Explain sklearn clustering in Python Data Drop in Python Falcon Python Flutter Python Google Python Class Excel to CSV in Python Google Chrome API in Python Gaussian elimination in python Matrix List Comprehension in Python Python List Size Python data science course StandardScaler in Sklearn Python Redis Example Python Program for Tower of Hanoi Python Printf Style Formating Python Percentage Sign Python Parse Text File Python Parallel Processing Python Online Compiler Python maketrans() function Python Loop through a Dictionary Python for Data Analysis Python for Loop Increment Python Kwargs Example Python Line Break What does base case mean in recursion What does the if __name__ == "__main__" do in Python What is Sleeping Time in Python Kite Python Length of Tuple in Python Python String Lowercase Python Struct Python Support Python String Variable Python System Command Python TCP Server Python Unit Test Cheat String Python Validator Unicode to String in Python An Introduction to Mocking in Python An Introduction to Subprocess in Python with Examples Anytree Python API Requests using Python App Config Python Check if the directory exists in Python Managing Multiple Python Versions With pyenv os.rename() method in Python os.stat() method in Python Python Ways to find nth occurrence of substring in a string Python Breakpoint Find Last Occurrence of Substring using Python Python Operators Python Selectors Python Slice from Last Occurrence of K Sentiment Analysis using NLTK String indices must be integers in Python Tensorflow Angular in Python AES CTR Python Crash Course on Python by Google Curdir Python Exrex Python FOO in Python Get Bounding Box Co-ordinates Python Hog Descriptor Opencv Python Important Difference between Python 2.x and Python 3.x with Example Io stringio Python iobase Python IPython Display Iterate through the list in Python Joint Plot in Python JWT Decode Python List Comprehension in Python List in Python Map Syntax in Python Python Marshmallow PyShark in Python Python Banner Python Logging Maxbytes Python Multiprocessing Processor Python Skyline Python Subprocess Call Example Python Sys Stdout Python Win32 Process Python's Qstandarditemmodel Struct Module in Python Sys Module in Python Tuple in Python Uint8 Python XXhash Python Examples XXhash Python Handling missing keys in Python dictionaries Python Num2words Python Os sep OSError in Python Periodogram in Python Pltpcolor in Python Poolmanager in Python Python pycountry Python pynmea2 Difference between Package and Module in Python How to add 2 lists in Python How to assign values to variables in Python and other languages How to build an Auto Clicker using Python How to check if the dictionary is empty in Python How to check the version of the Python Interpreter How to convert Float to Int in Python How to Convert Int to String in Python How to Define a Function in Python How to Install Pandas in Python How to Plot Graphs Using Python How to Program in Python on Raspberry pi How to Reverse a number in Python How to Sort a String in Python What is Collaborative Filtering in ML, Python What is the Python Global Interpreter Lock

Python Logistic Regression with Sklearn & Scikit

In this article, we'll walk through a tutorial for utilising the Python Sklearn (formerly known as Scikit Learn) package to implement logistic regression. To assist you in remembering the concept, we'll give you a quick explanation of logistic regression. After that, we'll create an entire project using a dataset to demonstrate Sklean logistic regression using the Logistic Regression() method.

Now let’s start with the basic introduction of Logistic Regression in Python.

Introduction of Logistic Regression in Python

A statistical method for classifying objects is logistic regression.

For a better understanding of Logistic regression in Python, let’s move on to its classification.

We need to understand what categorization entails to comprehend logistic regression. To further grasp this, let's look at the samples below:

  • The tumor is categorized as benign or malignant by a doctor.
  • A bank transaction could be legitimate or fraudulent.

Logistic regression is one component of machine learning that addresses this type of binary classification challenge. Other machine learning methods have been created and are currently being used to address various other issues.

Let’s understand the Logistic Regression in Python by taking an example given below:

Example

# importing dataset and logistic regression model libraries
# iris dataset imported
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris(return_X_y=True)


# creating the object of LogisticRegression
log_model = LogisticRegression(random_state=0)


# training the model
log_model.fit(X, y)


# testing the model
test_result = log_model.predict(X[45: 55, :])
print("output of the test input is:", test_result)


# checking the precision of the model
precision = log_model.score(X, y)
print("The precision of the model:", precision)

Output

output of the test input is: [0 0 0 0 0 1 1 1 1 1]
The precision of the model: 0.9733333333333334

The example above uses the iris data set for our training and testing of the logistic regression model. First, we imported all the needed libraries. We created 2 variables X and y, and we trained our model.

We check the precision of the model by using the score() method. And we have the given result.

There are other classification issues where more than two classes may be possible. We can be requested to separate different fruits from each other after being given a basket full of fruits. The multivariate categorisation is needed.

Let’s know what the syntax of Logistic Regression is.

Syntax of Logistic Regression

Syntax of logistic regression is given below

Class sklearn.linear_model.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)

Here we can see that Logistic regression has a lot of parameters. Let’s understand these parameters one by one.

Parameters

  1. Penalty: penalty may have values: “none,” “l1,  “ “l2”, and “elasticnet. No penalty will be applied when the penalty parameter is set to zero. When the penalty parameter is set to l1, the l1 penalty is applied. When the penalty parameter has elasticnet, both l1 and l2 penalties are applied.
  2. Dual: dual parameter can take the value true or false. If the dual set to the value is true, then the model works on the dual formulation. It is implemented only for the l2 penalty. The default value is false. The model works on the primal formulation.
  3. Tol: This instructs Scikit to give up looking for a minimum (or maximum) if a certain level of tolerance is reached. The default value of tol is “1e-4.”
  4. C: Negative float; the inverse of regularisation strength. A higher regularisation is indicated by smaller values, just like in support vector machines.
    Float, default is 1.0.
  5. fit_intercept: specifies whether the decision function should include a constant (also known as a bias or intercept).
    Bool, the default is True.
  6. intercept_scaling: Useful only if self.fit intercept is set to True, and the solver 'liblinear' is employed. A "synthetic" feature with a constant value equal to intercept scaling is added to the instance vector, making x become [x, self.intercept scaling]. Synthetic feature weight multiplied by intercept scaling yields the intercept.

    Please note that, like all other features, the synthetic feature weight is subject to l1/l2 regularisation. It is necessary to boost intercept scaling to decrease the impact of regularisation on synthetic feature weight (and hence on the intercept).
    Float, default is 1.
  7. class_weight: Weights with the format "class label: weight" are linked to certain classes. All classes are expected to have weight one if it is not provided.
    As n samples / (n classes * np.bincount(y)), the "balanced" mode automatically adjusts weights inversely proportional to class frequencies in the input data.

    If sample weight is supplied, it should be noted that these weights will be multiplied by sample weight and passed via the fit function.
  8. random_state: Used to shuffle the data when solver == "sag," "saga," or "liblinear." For more information, consult the glossary.
    Int, RandomState instance, default is None.

  9. Solver: The optimisation problem's algorithm. "lbfgs" is the default. You may want to take into account the following factors while selecting a solver:
    • "Liblinear" is a decent option for small datasets, but "sag" and "saga" are quicker for large datasets;
    • Only "newton-cg," "sag," "saga," and "lbfgs" handle multinomial loss in multiclass situations;
    • The term "liblinear" is only applicable to one-versus-rest schemes.

      {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’
  10.   max_iter: The most iterations necessary for the solvers to converge. Int, default is 100.
  11. multi_class: For each label, a binary problem fits if the option selected is "ovr." If the data is binary or solver='liblinear,' 'auto' chooses 'ovr'; else, chooses'multinomial'. Even when the data is binary, the multinomial loss fit throughout the full probability distribution is the loss that is minimised for "multinomial." When solver='liblinear,' multinomial is not available.

    {‘auto’, ‘ovr’, ‘multinomial’}, default is ’auto’.
  12. Verbose: Set verbose to any positive number for verbosity for the liblinear and lbfgs solvers.
    Int, default is 0.
  13. warm_start: When set to True, the solution from the prior call is used as initialisation; otherwise, the prior solution is simply erased. For the liblinear solver useless.
    Bool, the default is False.

  14. n_jobs: if multi class='ovr', the number of CPU cores utilised while parallelising over classes. Whether or not 'multi_class' is supplied, this argument is ignored when the solver is set to 'liblinear'. Except in a joblib.parallel backend environment, none means 1. Using all processors equals -1.
    Int, default is None.

  15. L1_ratio: Elastic-Net mixing parameter, where l1 ratio ranges from 0 to 1. Applied only when penalty='elasticnet'. While using penalty='l2' while setting l1 ratio=0, using penalty='l1' when setting l1 ratio=1 is comparable. The penalty is a combination of L1 and L2 for an l1 ratio of 0 to 1.
    Float, default is None.


Here we can see that Logistic regression has a lot of attributes. Let’s understand these attributes one by one.


Attributes

1. classes_: a list of the classifier's recognised class labels. It is an attribute in this.
     ndarray of shape (n_classes, )

2. coef_: coefficient of the decision function's characteristics.

When the supplied problem is binary, coef_ has the shape (1, n features). Coef_ corresponds to outcome 1 (True) when multi class='multinomial,' whereas -coef_ corresponds to outcome 0. (False).

ndarray of shape (1, n_features) or (n_classes, n_features)

3. intercept_: added intercept (also known as bias) to the decision function.

The intercept is 0 if the fit intercept is set to False. In cases when the presented problem is binary, intercept_ has the shape (1,). For example, when multi class='multinomial,' intercept_ corresponds to result from 1 (True), and -intercept_ to outcome 0. (False).

ndarray of shape (1,) or (n_classes,)

4. n_features_in_: number of features noticed when fitting.

It's Updated in version 0.24.

The data type of this attribute is Int.

5. feature_names_in_: characteristics identified by names during a fit. X is only defined when all of its feature names are strings.

It's Updated in version 1.0.

ndarray of shape (n_features_in_,)

6. n_iter_: Number of actual iterations for each class. Only the maximum number of iterations across all classes is provided for the liblinear solver. It only returns 1 element if the input is binary or multinomial.

Updated in version 0.20: n iter_ will now report at most max iter in SciPy versions greater than 1.0.0, where the number of lbfgs iterations may exceed max iter.

ndarray of shape (n_classes,) or (1, )

Creating Logistic Regression Model

Step by step, let’s understand how to create a LinearRegression model using sklearn in python.

1st step

Let’s start creating LogisticRegression Model by the following 1 step.

1st we must import all required libraries such as numpy, pandas, and seaborn.

Basically, in this step, we are going to load the libraries.

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix 
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression

Scikit-learn uses the SciPy stack's libraries in the order described below for data analysis.

  • Numpy - This library or module has Advanced linear algebraic and array operations.
  • SciPy - Has modules for linear algebra, optimisation, and other crucial data science operations.
  • IPython - Increasing interactivity on consoles.
  • Matplotlib - Data visualisation and graphing in two or three dimensions using Matplotlib.
  • SymPy - used for Computer algebra and symbolic computation.
  • Pandas - A data analysis and manipulation tool primarily using dataframes and tables.

2nd step

After performing 1 st step, Let’s move on to the second step.

For the logistic regression model, we use the built-in datasets stored in sklearn.dataset library. As we can see below, the dataset is enormous; therefore, for this tutorial's purposes, we'll be concentrating on two key columns.

Basically, in this step, we are going to load the dataset.

Let’s understand this by taking an example given below:

Example

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris(return_X_y=True)
x_df = pd.DataFrame(data=X, columns=load_iris().feature_names)
print(f'''table
{x_df.head()}


description
{x_df.describe()}
''')

Output:

table
   	sepal length (cm)  	sepal width (cm)  	petal length (cm)  	petal width (cm)
0                5.1               		3.5                		1.4               		0.2
1                4.9               		3.0                		1.4               		0.2
2                4.7               		3.2                		1.3               0.2
3                4.6               		3.1                		1.5               0.2
4                5.0               		3.6                		1.4               0.2


description
       	sepal length (cm)  	sepal width (cm)  	petal length (cm)  	petal width (cm)
count	  50.000000        	150.000000         	150.000000        	150.000000
mean     5.843333          	3.057333           	3.758000          	1.199333
std         0.828066          	0.435866           	1.765298          	0.762238
min       4.300000          	2.000000           	1.000000          	0.100000
25%      5.100000          	2.800000           	1.600000          	0.300000
50%      5.800000          	3.000000           	4.350000          	1.300000
75%      6.400000          	3.300000           	5.100000          	1.800000
max      7.900000          	4.400000           	6.900000          	2.500000

3rd step

After performing 2 nd step, Let’s move on to the third step.

The first step will separate the dependent variable from the independent variables in data frame Y.

The train_test_split() function was then used to divide the dataset into training and testing sets.

Basically, in this step, we will split the dataset into the Training and Test sets.

Let’s understand this by taking an example given below:

Example

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris(return_X_y=True)




# splitting our data into test data and training data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25, random_state=0)

4th step

After performing 3 rd step, Let’s move on to the fourth step.

StandardScaler carries out the task of standardisation. Different scales of variable values can be found in our dataset. While creating a machine learning model, several columns with multiple scales are standardised to have a similar scale.

Let’s understand this by taking an example given below:

Example

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris(return_X_y=True)




# splitting our data into test data and training data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25, random_state=0)


# scaling the x data set
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

5th step

After completing the 4th step, Let’s move on to the 5th step.

Here we are creating the logistic model we need to train using our train data. To create the logistic, we must create an instance of LogisticRegression(). Then we will train it using the fit() method. Here we are not passing any argument in the LogisticRegression(), so it will take the default parameter values.

Let’s understand this by taking an example given below:

Example

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris(return_X_y=True)




# splitting our data into test data and training data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25, random_state=0)


# scaling the x data set
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


# creating the model and training the model
logistic_model = LogisticRegression()
# training the model with the train data, i.e. X_train, y_train
logistic_model.fit(X_train, y_train)

6th step

After completing the 5th step, Let’s move on to the 6th step.

In this step, we will predict the result of our test data and match it with the original data. We will check the accuracy of our model. And we will try to find what per cent of the data matches all results.

Let’s understand this by taking an example given below:

Example

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris( return_X_y=True )




# splitting our data into test data and training data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25, random_state=0)


# scaling the x data set
sc = StandardScaler()
X_train = sc.fit_transform( X_train )
X_test = sc.transform( X_test )


# creating model
logistic_model = LogisticRegression()
logistic_model.fit( X_train, y_train )


# precision of the model for train dataset
train_precision = logistic_model.score( X_train, y_train )


# precision of the model for test dataset
test_precision = logistic_model.score( X_test, y_test )


# prediction by the model
y_pred = logistic_model.predict( X_test )


probability = logistic_model.predict_proba( X_test )
percent_setosa = list( map( lambda x: round(x[0]*100, 2), probability ))
percent_versicolor = list( map( lambda x: round(x[1]*100, 2), probability ))
percent_virginica = list( map( lambda x: round(x[2]*100, 2), probability ))
pred_table = pd.DataFrame(data={
    "original data": list( map( lambda i: load_iris().target_names[i], y_test )),
    "prediction": list( map( lambda i: load_iris().target_names[i], y_pred )),
    "setosa(%)": percent_setosa,
    "versicolor(%)": percent_versicolor,
    "virginica(%)": percent_virginica
})


print(f'''precision of the model for train dataset: {train_precision}
precision of the model for test dataset: {test_precision}


prediction table
{pred_table.head( 10 )}
''')

Output

precision of the model for train dataset: 0.9732142857142857
precision of the model for test dataset: 0.9736842105263158


prediction table
  	original data  	prediction  	setosa(%)  	versicolor(%)  	virginica(%)
0     	virginica   	virginica       	0.01           	3.10         		96.88
1    	versicolor  	versicolor      	0.61          	95.18          		4.21
2        	setosa      	setosa      	99.58           	0.42          		0.00
3     	virginica   	virginica       	0.00           	8.17         		91.83
4        	setosa      	setosa      	97.61           	2.39          		0.00
5     	virginica   	virginica       	0.00           	1.00         		98.99
6       	setosa      	setosa      	98.34           	1.66          		0.00
7    	versicolor  	versicolor      	0.72          	71.49         		27.79
8    	versicolor  	versicolor      	0.24          	72.88         		26.88
9    	versicolor  	versicolor      	2.31          	89.39          		8.29

7th step

After completing the 6th step, Let’s move on to the 7th step.

For more clarity, let's utilise the classification_report() function to determine the model's precision and recall for the test dataset. Here f1-score shows how many items from the test dataset has identified in the form of a per cent. It will show all states like precision for the value 1, 2, and 3, average, weight average, etc.

Let’s understand this by taking an example given below:

Example

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris( return_X_y=True )




# splitting our data into test data and training data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25, random_state=0)


# scaling the x data set
sc = StandardScaler()
X_train = sc.fit_transform( X_train )
X_test = sc.transform( X_test )


# creating model
logistic_model = LogisticRegression()
logistic_model.fit( X_train, y_train )


# precision of the model for train dataset
train_precision = logistic_model.score( X_train, y_train )


# precision of the model for test dataset
test_precision = logistic_model.score( X_test, y_test )


# prediction by the model
y_pred = logistic_model.predict( X_test )


probability = logistic_model.predict_proba( X_test )
percent_setosa = list( map( lambda x: round(x[0]*100, 2), probability ))
percent_versicolor = list (map( lambda x: round(x[1]*100, 2), probability ))
percent_virginica = list( map( lambda x: round(x[2]*100, 2), probability ))
pred_table = pd.DataFrame(data={
    "original data": list( map( lambda i: load_iris().target_names[i], y_test )),
    "prediction": list( map( lambda i: load_iris().target_names[i], y_pred )),
    "setosa(%)": percent_setosa,
    "versicolor(%)": percent_versicolor,
    "virginica(%)": percent_virginica
})


# getting the report of classification in detail
report = classification_report( y_test, y_pred )
print(f'''detailed report of our model for the test dataset
{report}''')

Output:

detailed report of our model for the test dataset
              	    precision 	recall 	f1-score   support


           0       	    1.00      	1.00    	1.00    		13
           1       	    1.00      	0.94    	0.97        	16
           2       	    0.90      	1.00    	0.95         	9


accuracy                           		0.97        	38
macro avg         0.97      	0.98    0.97        	38
weighted avg    0.98      	0.97    0.97        	38

8th step

After completing the 7th step, Let’s move on to the 8th step.

In this step, we will make a confusion matrix. The confusion matrix is a matrix that shows the performance classification algorithm that our model is using. In sklearn, we have a built-in module to build a confusion matrix named confusion_matrix(y_test, y_pred).

Let’s understand this by taking an example given below:

Example

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris( return_X_y=True )




# splitting our data into test data and training data
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.25, random_state=0)


# scaling the x data set
sc = StandardScaler()
X_train = sc.fit_transform( X_train )
X_test = sc.transform( X_test )


# creating model
logistic_model = LogisticRegression()
logistic_model.fit( X_train, y_train )


# precision of the model for train dataset
train_precision = logistic_model.score( X_train, y_train )


# precision of the model for test dataset
test_precision = logistic_model.score( X_test, y_test )


# prediction by the model
y_pred = logistic_model.predict( X_test )


probability = logistic_model.predict_proba( X_test )
percent_setosa = list( map( lambda x: round(x[0]*100, 2), probability ))
percent_versicolor = list( map( lambda x: round(x[1]*100, 2), probability ))
percent_virginica = list( map( lambda x: round(x[2]*100, 2), probability ))
pred_table = pd.DataFrame(data={
    "original data": list( map( lambda i: load_iris().target_names[i], y_test )),
    "prediction": list( map( lambda i: load_iris().target_names[i], y_pred )),
    "setosa(%)": percent_setosa,
    "versicolor(%)": percent_versicolor,
    "virginica(%)": percent_virginica
})


# getting the report of classification in detail
report = classification_report( y_test, y_pred )
# print(f'''detailed report of our model for the test dataset
#{report}''')


# creating confusion matrix by using confusion_matrix() function
c_mat = confusion_matrix( y_test, y_pred )
print(f'''Confusion matrix:
{c_mat}
''')

Output

Confusion matrix:
[[13  0  0]
 [ 0 15  1]
 [ 0  0  9]]

9th step

After completing the 8th step, Let’s move on to the 9th step.

In sklearn, we have a built-in module to build a confusion matrix named confusion_matrix( y_test, y_pred ). In this step, we will visualise the confusion matrix we created in the previous step. Now we will use pyplot and seaborn to visualise the confusion matrix. We will use the sns.heatmap() function to visualise the confusion matrix.

Let’s understand this by taking an example given below:

Example

# importing required libraries
# importing numpy and pandas for data structure
import numpy as np
import pandas as pd
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, roc_auc_score, classification_report, accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualisation
import seaborn as sns
import matplotlib.pyplot as plt
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris( return_X_y=True )




# splitting our data into test data and training data
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.25, random_state=0)


# scaling the x data set
sc = StandardScaler()
X_train = sc.fit_transform( X_train )
X_test = sc.transform( X_test )


# creating model
logistic_model = LogisticRegression()
logistic_model.fit( X_train, y_train )


# precision of the model for train dataset
train_precision = logistic_model.score( X_train, y_train )


# precision of the model for test dataset
test_precision = logistic_model.score( X_test, y_test )


# prediction by the model
y_pred = logistic_model.predict( X_test )


probability = logistic_model.predict_proba( X_test )
percent_setosa = list( map( lambda x: round(x[0]*100, 2), probability ))
percent_versicolor = list( map( lambda x: round(x[1]*100, 2), probability ))
percent_virginica = list( map( lambda x: round(x[2]*100, 2), probability ))
pred_table = pd.DataFrame( data={
    "original data": list( map( lambda i: load_iris().target_names[i], y_test )),
    "prediction": list( map( lambda i: load_iris().target_names[i], y_pred )),
    "setosa(%)": percent_setosa,
    "versicolor(%)": percent_versicolor,
    "virginica(%)": percent_virginica
})


# getting the report of classification in detail
report = classification_report(y_test, y_pred)


# creating confusion matrix by using confusion_matrix() function
c_mat = confusion_matrix(y_test, y_pred)


sns.heatmap(c_mat, annot=True, cmap="Reds")
plt.title("confusion matrix")
plt.show()

Output

Python Logistic Regression with Sklearn & Scikit

Conclusion

We hope you enjoyed our article and know how to use Sklearn (Scikit Learn) to create logistic regression in Python(Implementation of logistic regression using the Scikit-Learn framework on the IRIS Dataset). We gave you a step-by-step example of how to use a dataset and the SKlearn LogisticRegression() function to build a logistic regression model for a prediction task. The tutorial also demonstrates that we shouldn't rely on accuracy scores to assess how well-imbalanced datasets perform.