SKLearn Model Selection
- The model selections of SK learn has many functions with which we can work on.
- It has functions to cross-validate the model and, it also provides validation and learning curves.
- It is used for hyperparameters to tune the estimators.
- Let us consider a list of functions the module provides.
The Splitter Classes
- ms.GroupKFold([ n_splits ]): It is similar to cross-validation. This function is mainly used to form non – overlapping groups.
- ms.GroupShuffleSplit([ ... ]): This function is mainly used to shuffle groups outcross validation test.
- ms.KFold([ n_splits, shuffle, ... ]): We can use this function to perform kformcross-validation test.
- ms.LeaveOneGroupOut( ): The aim of this function is used to perform the leave one group out cross-validation test.
- ms.LeavePGroupsOut( n_groups ): This function uses p group out to perform its activity. It used this group to perform the test.
- ms.LeaveOneOut( ): The aim of the function is used to perform leave one out cross-validation.
- ms.LeavePOut( p ): The main aim of this function is to perform the cross validator with leave one out.
- ms.PredefinedSplit( test_fold ): The main aim of the function is used to perform the test on the p-outcross-validation test. This function is the original version of the leave-one-out test.
- ms.RepeatedKFold( *[, n_splits, ... ] ): This function is used to perform test on the predefined split and it perform cross-validation test.
- ms.RepeatedStratifiedKFold( *[, ... ] ): The main purpose of this function is to perform the test on k-fold cross-validation which is repeatedly stratified.
- ms.ShuffleSplit([n_splits, ... ]): The aim of the function is used to perform tests on datasetsthat are shuffled silt with the help of random permutation.
- ms.StratifiedKFold([ n_splits, ... ]): This function is used to perform test of stratified k fold cross validation test.
- ms.StratifiedShuffleSplit([ ... ]): The aim of the function is used to perform a test on a dataset of shuffle split cross-validationtests.
- ms.StratifiedGroupKFold([ ... ]): The aim of the function is used to perform a test on data of nonoverlappinggroups to perform k fold cross-validation test.
- ms.TimeSeriesSplit([ n_splits, ... ]): This function is mainly used for time series of cross-validation test.
Splitter Functions
- ms.check_cv([ cv, y, classifier ]): This function helps in performing cross validation test by checking the utility.
- ms.train_test_split( *arrays[, ...] ): This function separates the matrices into the training and it tests the data sets based on the random which is used to perform the cross-validation test.
Hyperparameter Optimizers
- ms.GridSearchCV( estimator, ... ): The main aim of the function is to perform a search for an estimator with the defined parameters.
- ms.HalvingGridSearchCV( ...[, ...] ): The main aim of the function is to perform the search with the given parameters with the help of successive halving.
- ms.ParameterGrid( param_grid ): The main aim of the parameter is used to perform the grid of parameters, There is a different range of values for every parameter.
- msParameterSampler( ..[, ...] ): This function is taken from the given sample distribution, which works as a generator on samples of parameters.
- ms.RandomizedSearchCV( ...[, ...] ): This function is used to work on the hyperparameters, which helps to perform a random search.
- ms.HalvingRandomSearchCV( ...[, ...] ): This function is used to work on the hyperparameters, which helps to perform a random search.
Program:
# Python program onk-fold cross-validation test
# Import the library required
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, KFold
# Loading the dataset
i = load_iris()
# Getting dependent and independent features
A = i.data
B= i.target
print("The size of the Dataset is : ", len( A ))
# Creating a object of logistic regression
log = LogisticRegression()
# Perform the logistic regression on dataset of train
log.fit( X, Y )
# Performing K-fold cross-validation test
k = KFold(n_splits = 5)
s= cross_val_score(log, A,B, cv=k)
# Printing accuracy of the scores
print("Validation scores ofK-fold Cross :", s)
print("Validation score of Mean Cross : ", s.mean())
Output:
The size of the Dataset is: 100
Validation Scores of K-fold Cross: [2. 2 0.86333667 0.91111133 0.89999999]
Validation score of mean cross: 0.95688888889990