Accuracy_score Function in Sklearn
A crucial stage in data science is measuring our model's performance using the appropriate metric. In this article, we will examine two methods for calculating the accuracy of your predictions: manually and using Python's sklearn library.
This Python lesson will cover a variety of examples connected to scikit-learn accuracy_score, as well as the scikit-learn accuracy_score module in general.
Before starting the article Accuracy Score Sklearn in python, we should know the meaning of “Accuracy”.
One of the most used criteria for assessing the effectiveness of classification models is accuracy. The percentage of labels that our model successfully classified is indicated by accuracy. For instance, if our model accurately identified 30 of 100 labels, its accuracy would be 0.30.
Accuracy_score Function in Sklearn
In Python Scikit Learn, the accuracy of the fraction or count of correct predictions is determined using the accuracy score method.
It mathematically denotes the ratio between the total number of forecasts that came true, both positively and negatively.
The function returns the subset accuracy when classifying objects with multiple labels. If the sample's full set of predicted labels accurately corresponds with the true set of labels. If so, the subset's accuracy is 1.0; otherwise, it is almost equal to 0.0.
Syntax of Accuracy_score Function in Sklearn
The syntax of the accuracy_score() function is given below:
sklearn.metrics.accuracy_score( y_true, y_pred, * , normalize=True, sample_weight=None )
Parameters
The accuracy_score() function has 4 parameters. Each parameter is described below.
- y_true: this parameter takes the values like 1d array, labelled indicator array, or a sparse matrix. This is the original data value for a given X. This is a necessary function parameter.
- y_pred: this parameter takes the values like 1d array, labelled indicator array, or a sparse matrix. This is the value of the predicted data for a given X by the model. This is also a necessary parameter of the function.
- Normalize: this parameter takes the boolean value True or false. If the value of this parameter is equal to true, then the function returns the fraction of the right predicted items. If the value of this parameter is equal to false, then the function returns the number of correct predicted items.
- sample_weight: This parameter takes an “ array-like of shape (n_samples,)”. The default value of this parameter is None.
Returns
The accuracy_score() function return the float value.
If the value of the “ normalize ” parameter is equal to True, the return value will be the fraction of the right classification samples.
But if the value of the “ normalize ” parameter is equal to False, then the return value will be the number of the right classification samples.
The function returns the subset accuracy for multi-label classification. If every predicted label for the sample accurately corresponds to the actual label set. The subset's accuracy is 1.0 in this case; otherwise, it is almost 0.0.
Mathematical Representation of Accuracy_score Function
The accuracy_score function is the ratio of correct classification samples to the total samples when the value of the “ normalize ” parameter is equal to True.
accuracy_score for normalize = True
accuracy_score = correct classification samples / total samples in classification
If the value of the “ normalize ” parameter is equal to False, then the accuracy_score is equal to the correct classification samples.
accuracy_score for normalize = False
accuracy_score = total correct classification samples
Let’s Understand it by an example:
Suppose we have a data array y_true = [ 0, 2, 5, 1, 6, 3, 1 ]
And we have the predicted data array y_pred = [ 0, 3, 5, 0, 2, 3, 1 ]
Here we can see that the total number of samples is 7.
In the y_pred array, the value of the index numbers 0, 2, 5, and 6 are correct. So the value of “ correct classification samples ” is 4.
correct classification samples = 4
If the value of the “ normalize ” parameter is equal to False, then the value of accuracy_score is equal to correct classification samples.
accuracy_score = correct classification samples = 4
But If the value of the “ normalize ” parameter is equal to False, then the value of accuracy_score is equal to correct classification samples / total samples.
accuracy_score = 4/7 = .571
Examples of Accuracy_score()
As far as we know, the scikit-learn library concentrates on data modelling rather than data loading and manipulation. The accuracy of the data may be determined in this case using scikit learn accuracy_score.
Let’s understand how to use the accuracy_score() function to calculate the accuracy of prediction by taking the examples given below:
Example 1
import numpy as np
from sklearn.metrics import accuracy_score
y_true = np.array( [ 0, 2, 5, 1, 6, 3, 1 ] )
y_pred = np.array( [ 0, 3, 5, 0, 2, 3, 1 ] )
accuracy = accuracy_score( y_true, y_pred, normalize=True )
print( "The accuracy of the y_pred array is:", accuracy )
Output
The accuracy of the y_pred array is: 0.5714285714285714
In the above example, we have two arrays. One is the y_true array that represents the original data. The second one is y_pred which represents the predicted data. To check the predicted data's accuracy, we use the accuracy_score() function. Here the value of the “ normalize ” parameter is equal to True so that the accuracy_score function will return the fraction of correct classification data.
In the above example, if the value of the “ normalize ” parameter is equal to the False, then the accuracy_score function will return the number of the correct classification data.
import numpy as np
from sklearn.metrics import accuracy_score
y_true = np.array( [ 0, 2, 5, 1, 6, 3, 1 ] )
y_pred = np.array( [ 0, 3, 5, 0, 2, 3, 1 ] )
accuracy = accuracy_score( y_true, y_pred, normalize=False )
print( "The number of correct data in the y_pred array is:", accuracy )
Output
The number of correct data in the y_pred array is: 4
Let’s create a real classification model and check the accuracy of the model.
Example 2
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
# importing seaborn and matplotlib.pyplot for visualization
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression
# loading data set into x and y variables
X, y = load_iris( return_X_y=True )
# scaling of the dataset
sc = StandardScaler()
X = sc.fit_transform( X )
# splitting the data into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.25 )
# creating the classification model
classification_model = LogisticRegression()
classification_model.fit( X_train, y_train )
# prediction for the train dataset
y_train_pred = classification_model.predict( X_train )
# prediction for the test dataset
y_test_pred = classification_model.predict( X_test )
# checking the accuracy of the training dataset
train_accuracy = accuracy_score( y_train, y_train_pred, normalize=True )
# checking the accuracy of the test dataset
test_accuracy = accuracy_score( y_test, y_test_pred, normalize=True )
print( "the accuracy for the training dataset", train_accuracy )
print( "the accuracy for the test dataset", test_accuracy )
Output
The accuracy for the training dataset 0.9732142857142857
The accuracy for the test dataset 0.9210526315789473
We have created a classification model for the iris dataset in the above example. After creating the classification model, we are trying to check the accuracy of the model for the training dataset and testing dataset. Here we are using the accuracy_score() function to check the accuracy. The value of the “ normalize ” parameter is True, so the function returns the fraction of the correct classification value.
If we want to see how many numbers of data are correct, then we have to set the value of the “ normalize ” parameter to False.
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
# importing seaborn and matplotlib.pyplot for visualization
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression
# loading data set into x and y variables
X, y = load_iris(return_X_y=True)
# scaling of the dataset
sc = StandardScaler()
X = sc.fit_transform(X)
# splitting the data into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25)
# creating the classification model
classification_model = LogisticRegression()
classification_model.fit(X_train, y_train)
# prediction for the train dataset
y_train_pred = classification_model.predict(X_train)
# prediction for the test dataset
y_test_pred = classification_model.predict(X_test)
# checking the accuracy of the training dataset
train_accuracy = accuracy_score(y_train, y_train_pred, normalize=False)
# checking the accuracy of the test dataset
test_accuracy = accuracy_score(y_test, y_test_pred, normalize=False)
print("the number of correct data for the training dataset", train_accuracy)
print("the number of correct data for the test dataset", test_accuracy)
Output
The number of correct data for the training dataset 109
The number of correct data for the test dataset 36
Working of Accuracy_score Function
Now, we know what accuracy_score is and how to use it in the code. Now let’s understand the working of the accuracy_score function.
- accuracy_score function takes two compulsory parameters one is y_true, which is the original data for a particular X and the second parameter is y_pred, which is the predicted data that the model predicts.
- y_true and y_pred are compared index by index. Then we count how many items in y_true and y_pred are the same.
- If the value of the “ normalize ” parameter is equal to the True, then we return the count of correct data / total count of data.
- If the value of the “ normalize ” parameter is equal to the False, then we return the count of correct data.
We can also find the predicted data's accuracy with the confusion matrix's help.
The scikit-learn confusion matrix is a method for measuring classification performance.
The classification problem's outcome can also be foreseen or summarised using the confusion matrix.
Let’s understand the working of the accuracy_score function with the help of the confusion matrix.
Example
# importing all important libraries
# importing numpy
import numpy as np
# importing accuracy_score and confusion_matrix from the sklearn.metrics
from sklearn.metrics import accuracy_score, confusion_matrix
# counting the correct classification data
def count_correct_items( matrix ):
count = 0
for i in range( len( matrix ) ):
count += matrix[i][i]
return count
y_true = np.array( [ 0, 2, 5, 1, 6, 3, 1 ] )
y_pred = np.array( [ 0, 3, 5, 0, 2, 3, 1 ] )
# creating the confusion matrix
con_matrix = confusion_matrix( y_true, y_pred)
correct_count = count_correct_items( con_matrix )
total_items = y_true.size
# accuracy of the predicted data by the confusion matrix
accuracy_by_matrix = correct_count / total_items
# accuracy of the predicted data by the accuracy_score function
accuracy_by_function = accuracy_score( y_true, y_pred, normalize=True )
print( f"the accuracy of data by the matrix: { accuracy_by_matrix }" )
print( f"the accuracy of data by the accuracy_score function: { accuracy_by_function }" )
Output
The accuracy of data by the matrix: 0.5714285714285714
The accuracy of data by the accuracy_score function: 0.5714285714285714
In the above example, we are trying to calculate accuracy using the confusion matrix. First, we have imported all the required libraries such as numpy and sklearn.metrics. Now we are taking an array y_true that has the original data, and manually we are creating another array y_pred, which has the predicted data. Now we are calculating the correct predicted data by using the confusion matrix. Then we calculate the size of the array, which is equal to the total number of items in the array as we know that accuracy_score is equal to ( correct data/total items ), so we can calculate the accuracy. To check if we get the correct accuracy, let’s call the accuracy_score function.
In the above case, if we want to know the number of correct data, i.e. float return by the accuracy_score() function when the value of the “ normalize ” function is equal to the False. So there is no need to divide correct data by the size of the array, as shown below.
Example
# importing all important libraries
# importing numpy
import numpy as np
# importing accuracy_score and confusion_matrix from the sklearn.metrics
from sklearn.metrics import accuracy_score, confusion_matrix
# counting the correct classification data
def count_correct_items( matrix ):
count = 0
for i in range( len( matrix ) ):
count += matrix[i][i]
return count
y_true = np.array( [ 0, 2, 5, 1, 6, 3, 1 ] )
y_pred = np.array( [ 0, 3, 5, 0, 2, 3, 1 ] )
# creating the confusion matrix
con_matrix = confusion_matrix( y_true, y_pred)
correct_count = count_correct_items( con_matrix )
# accuracy of the predicted data by the confusion matrix
accuracy_by_matrix = correct_count
# accuracy of the predicted data by the accuracy_score function
accuracy_by_function = accuracy_score( y_true, y_pred, normalize=False )
print( f"the correct number of the data by the matrix: { accuracy_by_matrix }" )
print( f"the correct number of the data by the accuracy_score function: { accuracy_by_function }" )
Output
The correct number of the data by the matrix: 4
The correct number of the data by the accuracy_score function: 4
Let’s calculate the accuracy of a real-world example with the iris dataset.
Example
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualization
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression
# counting the correct classification data
def count_correct_items( matrix ):
count = 0
for i in range( len( matrix ) ):
count += matrix[i][i]
return count
# loading data set into x and y variables
X, y = load_iris(return_X_y=True)
# scaling of dataset
sc = StandardScaler()
X = sc.fit_transform(X)
# splitting the data into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25)
# creating the classification model
classification_model = LogisticRegression()
classification_model.fit(X_train, y_train)
# prediction for the train dataset
y_train_pred = classification_model.predict(X_train)
# prediction for the test dataset
y_test_pred = classification_model.predict(X_test)
# creating confusion matrix of training and testing dataset
train_matrix = confusion_matrix(y_train, y_train_pred)
test_matrix = confusion_matrix(y_test, y_test_pred)
correct_train_data = count_correct_items(train_matrix)
correct_test_data = count_correct_items(test_matrix)
train_data_size = len(y_train)
test_data_size = len(y_test)
# checking the accuracy for the training dataset by the matrix
train_accuracy_matrix = correct_train_data / train_data_size
# checking the accuracy for the test dataset by the matrix
test_accuracy_matrix = correct_test_data / test_data_size
# checking the accuracy for the training dataset by the function
train_accuracy_function = accuracy_score(y_train, y_train_pred, normalize=True)
# checking the accuracy for the test dataset by the function
test_accuracy_function = accuracy_score(y_test, y_test_pred, normalize=True)
print("the accuracy for the training dataset by the matrix", train_accuracy_matrix)
print("the accuracy for the test dataset by the matrix", test_accuracy_matrix)
print("the accuracy for the training dataset by the function", train_accuracy_function)
print("the accuracy for the test dataset by the function", test_accuracy_function)
Output
The accuracy for the training dataset by the matrix 0.9642857142857143
The accuracy for the test dataset by the matrix 0.9473684210526315
The accuracy for the training dataset by the function 0.9642857142857143
The accuracy for the test dataset by the function 0.9473684210526315
In the above example, we have created a classification model for the iris dataset. And then we are checking the accuracy of that data set using the confusion matrix.
But if we want to see the number of correct data then we have to use the below formula.
“ accuracy_score = total correct classification samples ”
This formula is used, when the value of the accuracy function is equal to the False.
Example
# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
# importing seaborn and matplotlib.pyplot for visualization
# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression
# counting the correct classification data
def count_correct_items( matrix ):
count = 0
for i in range( len( matrix ) ):
count += matrix[i][i]
return count
# loading data set into x and y variables
X, y = load_iris( return_X_y=True )
# scaling of the dataset
sc = StandardScaler()
X = sc.fit_transform( X )
# splitting the data into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.25 )
# creating the classification model
classification_model = LogisticRegression()
classification_model.fit ( X_train, y_train )
# prediction for the train dataset
y_train_pred = classification_model.predict( X_train )
# prediction for the test dataset
y_test_pred = classification_model.predict( X_test )
# creating confusion matrix of training and testing dataset
train_matrix = confusion_matrix( y_train, y_train_pred )
test_matrix = confusion_matrix( y_test, y_test_pred )
correct_train_data = count_correct_items( train_matrix )
correct_test_data = count_correct_items( test_matrix )
# checking the accuracy for the training dataset by the function
train_accuracy_function = accuracy_score( y_train, y_train_pred, normalize=False )
# checking the accuracy for the test dataset by the function
test_accuracy_function = accuracy_score( y_test, y_test_pred, normalize=False )
print( "the number of correct data for the training dataset by the matrix", correct_train_data )
print( "the number of correct data for the test dataset by the matrix", correct_test_data )
print( "the number of correct data for the training dataset by the function", train_accuracy_function )
print( "the number of correct data for the test dataset by the function", test_accuracy_function )
Output
The number of correct data for the training dataset by the matrix 107
The number of correct data for the test dataset by the matrix 37
The number of correct data for the training dataset by the function 107
The number of correct data for the test dataset by the function 37
Conclusion
The best library for creating machine learning models is the scikit-learn package. All newbies rely on it as the first machine learning-focused library to help them through their initial learning process. And despite my experience, I frequently use it to rapidly test out a theory or solution we have in mind.
As we just witnessed, the most recent update contains some unquestionably substantial upgrades. It's worth investigating independently and playing with the foundation I've given you in this essay.