Python Tutorial

Introduction Python Features Python Applications Python System requirements Python Installation Python Examples Python Basics Python Indentation Python Variables Python Data Types Python IDE Python Keywords Python Operators Python Comments Python Pass Statement

Python Conditional Statements

Python if Statement Python elif Statement Python If-else statement Python Switch Case

Python Loops

Python for loop Python while loop Python Break Statement Python Continue Statement Python Goto Statement

Python Arrays

Python Array Python Matrix

Python Strings

Python Strings Python Regex

Python Built-in Data Structure

Python Lists Python Tuples Python Lists vs Tuples Python Dictionary Python Sets

Python Functions

Python Function Python min() function Python max() function Python User-define Functions Python Built-in Functions Python Recursion Anonymous/Lambda Function in Python apply() function in python Python lambda() Function

Python File Handling

Python File Handling Python Read CSV Python Write CSV Python Read Excel Python Write Excel Python Read Text File Python Write Text File Read JSON File in Python

Python Exception Handling

Python Exception Handling Python Errors and exceptions Python Assert

Python OOPs Concept

OOPs Concepts in Python Classes & Objects in Python Inheritance in Python Polymorphism in Python Python Encapsulation Python Constructor Python Super function Python Static Method Static Variables in Python Abstraction in Python

Python Iterators

Iterators in Python Yield Statement In Python Python Yield vs Return

Python Generators

Python Generator

Python Decorators

Python Decorator

Python Functions and Methods

Python Built-in Functions Python String Methods Python List Methods Python Dictionary Methods Python Tuple Methods Python Set Methods

Python Modules

Python Modules Python Datetime Module Python Math Module Python Import Module Python Time ModulePython Random Module Python Calendar Module CSV Module in Python Python Subprocess Module

Python MySQL

Python MySQL Python MySQL Client Update Operation Delete Operation Database Connection Creating new Database using Python MySQL Creating Tables Performing Transactions

Python MongoDB

Python MongoDB

Python SQLite

Python SQLite

Python Data Structure Implementation

Python Stack Python Queue Python Linked List Python Hash Table Python Graph

Python Advance Topics

Speech Recognition in Python Face Recognition in Python Python Linear regression Python Rest API Python Command Line Arguments Python JSON Python Subprocess Python Virtual Environment Type Casting in Python Python Collections Python Attributes Python Commands Python Data Visualization Python Debugger Python DefaultDict Python Enumerate

Python 2

What is Python 2

Python 3

Anaconda in Python 3 Anaconda python 3 installation for windows 10 List Comprehension in Python3

How to

How to Parse JSON in Python How to Pass a list as an Argument in Python How to Install Numpy in PyCharm How to set up a proxy using selenium in python How to create a login page in python How to make API calls in Python How to run Python code from the command prompt How to read data from com port in python How to Read html page in python How to Substring a String in Python How to Iterate through a Dictionary in Python How to convert integer to float in Python How to reverse a string in Python How to take input in Python How to install Python in Windows How to install Python in Ubuntu How to install PIP in Python How to call a function in Python How to download Python How to comment multiple lines in Python How to create a file in Python How to create a list in Python How to declare array in Python How to clear screen in Python How to convert string to list in Python How to take multiple inputs in Python How to write a program in Python How to compare two strings in Python How to create a dictionary in Python How to create an array in Python How to update Python How to compare two lists in Python How to concatenate two strings in Python How to print pattern in Python How to check data type in python How to slice a list in python How to implement classifiers in Python How To Print Colored Text in Python How to open a file in python How to Open a file in python with Path How to run a Python file in CMD How to change the names of Columns in Python How to Concat two Dataframes in Python How to Iterate a List in Python How to learn python Online How to Make an App with Python How to develop a game in python How to print in same line in python How to create a class in python How to find square root in python How to import numy in python How to import pandas in python How to uninstall python How to upgrade PIP in python How to append a string in python How to comment out a block of code in Python How to change a value of a tuple in Python How to append an Array in Python How to Configure Python Interpreter in Eclipse Parameter Passing in Python How to plot a Histogram in Python How to Import Files in Python How to Download all Modules in Python How to get Time in seconds in Python How to Practice Python Programming How to plot multiple linear regression in Python How to set font for Text in Python

Sorting

Python Sort List Sort Dictionary in Python Python sort() function Python Bubble Sort

Programs

Factorial Program in Python Prime Number Program in Python Fibonacci Series Program in Python Leap Year Program in Python Palindrome Program in Python Check Palindrome In Python Calculator Program in Python Armstrong Number Program in Python Python Program to add two numbers Anagram Program in Python Number Pattern Programs in Python Even Odd Program in Python GCD Program in Python Python Exit Program Python Program to check Leap Year Operator Overloading in Python Pointers in Python Python Not Equal Operator Raise Exception in Python Salary of Python Developers in India What is a Script in Python

Misc

Introduction to Scratch programming SKLearn Clustering SKLearn Linear Module Standard Scaler in SKLearn Python Time Library SKLearn Model Selection Standard Scaler in SKLearn Accuracy_score Function in Sklearn Append key Value to Dictionary in Python Cross Entropy in Python Cursor in Python Data Class in Python How to Install Tweepy in Python Imread Python Program of Cumulative Sum in Python Python Program for Linear Search Python Program to Generate a Random String Read numpy array in Python Scrimba python Sklearn linear Model in Python Scraping data in python Accessing Key-value in Dictionary in Python Find Median of List in Python Linear Regression using Sklearn with Example Problem-solving with algorithm and data structures using Python Python 2.7 data structures Python Variable Scope with Local & Non-local Examples Arguments and parameters in Python Assertion error in python Programs for Printing Pyramid Patterns in Python _name_ in Python Amazon rekognition using python Anaconda python 3.7 download for windows 10 64-bit Android apps for coding in python Augmented reality in python Best app for python Difference between Perl and Python Not supported between instances of str and int in python Python comment symbol Python Complex Class Python IDE names Selection Sort Using Python Hypothesis Testing in Python Idle python download for Windows Insertion Sort using Python Merge Sort using Python Python - Binomial Distribution Python Logistic Regression with Sklearn & Scikit Python Random shuffle() method Python variance() function Python vs HTML Removing the First Character from the String in Python Adding item to a python dictionary Best books for NLP with Python Best Database for Python Count Number of Keys in Dictionary Python Cross Validation in Sklearn Drop() Function in Python EDA in Python Excel Automation with Python Python Program to Find the gcd of Two Numbers Python Web Development projects Adding a key-value pair to dictionary in Python Python Euclidean Distance Python Filter List Python Fit Transform Python e-book free download Python email utils Python range() Function Python random.seed() function What is the re.sub() function in Python Python PPTX Python Pickle Python Seaborn Python Coroutine Python EOL Python Infinity Python math.cos and math.acos function Python Project Ideas Based On Django Reverse a String in Python Reverse a Number in Python Python Word Tokenizer Python Trigonometric Functions Python try catch exception GUI Calculator in Python Implementing geometric shapes into the game in python Installing Packages in Python Python Try Except Python Sending Email Socket Programming in Python Python CGI Programming Python Data Structures Python abstract class Python Compiler Python K-Means Clustering NSE Tools In Python Operator Module In Python Palindrome In Python Permutations in Python Pillow Python introduction and setup Python Functionalities of Pillow Module Python Argmin Python whois Python JSON Schema Python lock Return Statement In Python Reverse a sentence In Python tell() function in Python Why learn Python? Write Dictionary to CSV in Python Write a String in Python Binary Search Visualization using Pygame in Python Latest Project Ideas using Python 2022 Closest Pair of Points in Python ComboBox in Python Python vs R Best resources to learn Numpy and Pandas in python Check Letter in a String Python Python Console Python Control Statements Convert Float to Int in Python using Pandas Difference between python list and tuple Importing Numpy in Pycharm Python Key Error Python NewLine Python tokens and character set Python Strong Number any() Keyword in python Best Database in Python Check whether dir is empty or not in python Comments in the Python Programming Language Convert int to Float in Python using Pandas Decision Tree Classification in Python End Parameter in python __GETITEM__ and __SETITEM__ in Python Python Namespace Python GUI Programming List Assignment Index out of Range in Python List Iteration in Python List Index out of Range Python for Loop List Subtract in Python Python Empty Tuple Python Escape Characters Sentence to python vector Slicing of a String in Python Executing Shell Commands in Python Genetic Algorithm in python Get index of element in array in python Looping through Data Frame in Python Syntax of Map function in Python After Python What Should I Learn Python AIOHTTP Alexa Python Artificial intelligence mini projects ideas in python Artificial intelligence mini projects with source code in Python Find whether the given stringnumber is palindrome or not First Unique Character in a String Python Python Network Programming Python Interface Python Multithreading Python Interpreter Data Distribution in python Flutter with tensor flow in python Front end in python Iterate a Dictionary in Python Iterate a Dictionary in Python – Part 2 Allocate a minimum number of pages in python Assertion Errors and Attribute Errors in Python Checking whether a String Contains a Set of Characters in python Python Control Flow Statements *Args and **Kwargs in Python Bar Plot in Python Conditional Expressions in Python Function annotations() in Python How to Write a Configuration file in Python Image to Text in python import() Function in Python Import py file in Python Multiple Linear Regression using Python Nested Tuple in Python Python String Negative Indexing Reading a File Line by Line in Python Python Comment Block Base Case in Recursive function python ER diagram of the Bank Management System in python Image to NumPy Arrays in Python NOT IN operator in Python One Liner If-Else Statements in Python Sklearn in Python Cube Root in Python Python Variables, Constants and Literals What Does the Percent Sign (%) Mean in Python Creating Web Application in python Notepad++ For Python PyPi TensorFlow Python | Read csv using pandas.read_csv() What is online python free IDE What is Python online compiler Run exec python from PHP What are the Purposes of Python What is Python compiler GDB Python coding platform Python Classification Python | a += b is not always a = a + b PyDev with Python IDE Character Set in Python Best Python AI Projects _dict_ in Python Python Ternary Operators Self in Python Python vs Java Python Modulo Python Packages Python Syntax Python Uses Python Bitwise Operators Python Identifiers Python Matrix Multiplication Python AND Operator Python Logical Operators Python Multiprocessing Python Unit Testing __init__ in Python Advantages of Python Is Python Case-sensitive when Dealing with Identifiers Python Boolean Python Call Function Python History Python Image Processing Python main() function Python Permutations and Combinations Difference between Input() and raw_input() functions in Python Conditional Statements in python Confusion Matrix Visualization Python Nested List in Python Python Algorithms Python Modules List Difference between Python 2 and Python 3 Is Python Case Sensitive Method Overloading in Python Python Arithmetic Operators Assignment Operators in Python Is Python Object Oriented Programming language Python Division Python exit commands Continue And Pass Statements In Python Colors In Python Convert String Into Int In Python Convert String To Binary In Python Convert Uppercase To Lowercase In Python Convert XML To JSON In Python Converting Set To List In Python Covariance In Python CSV Module In Python Decision Tree In Python Difference Between Yield And Return In Python Dynamic Typing In Python BOTTLE Python Web Framework How to Install Scikit-Learn Introducing modern python computing in simple packages Python vs PHP Reason for Python So Popular Returning Multiple Values in Python Spotify API in Python Spyder (32-bit) - Free download Time. Sleep() in Python Traverse Dictionary in Python What is Ipython shell YOLO Python Nested for Loop in Python Data Structures and Algorithms Using Python | Part 1 Data Structures and Algorithms using Python | Part 2 ModuleNotFoundError No module named 'mysql' in Python N2 in Python XGBoost for Regression in Python Explain sklearn clustering in Python Data Drop in Python Falcon Python Flutter Python Google Python Class Excel to CSV in Python Google Chrome API in Python Gaussian elimination in python Matrix List Comprehension in Python Python List Size Python data science course StandardScaler in Sklearn Python Redis Example Python Program for Tower of Hanoi Python Printf Style Formating Python Percentage Sign Python Parse Text File Python Parallel Processing Python Online Compiler Python maketrans() function Python Loop through a Dictionary Python for Data Analysis Python for Loop Increment Python Kwargs Example Python Line Break What does base case mean in recursion What does the if __name__ == "__main__" do in Python What is Sleeping Time in Python Kite Python Length of Tuple in Python Python String Lowercase Python Struct Python Support Python String Variable Python System Command Python TCP Server Python Unit Test Cheat String Python Validator Unicode to String in Python An Introduction to Mocking in Python An Introduction to Subprocess in Python with Examples Anytree Python API Requests using Python App Config Python Check if the directory exists in Python Managing Multiple Python Versions With pyenv os.rename() method in Python os.stat() method in Python Python Ways to find nth occurrence of substring in a string Python Breakpoint Find Last Occurrence of Substring using Python Python Operators Python Selectors Python Slice from Last Occurrence of K Sentiment Analysis using NLTK String indices must be integers in Python Tensorflow Angular in Python AES CTR Python Crash Course on Python by Google Curdir Python Exrex Python FOO in Python Get Bounding Box Co-ordinates Python Hog Descriptor Opencv Python Important Difference between Python 2.x and Python 3.x with Example Io stringio Python iobase Python IPython Display Iterate through the list in Python Joint Plot in Python JWT Decode Python List Comprehension in Python List in Python Map Syntax in Python Python Marshmallow PyShark in Python Python Banner Python Logging Maxbytes Python Multiprocessing Processor Python Skyline Python Subprocess Call Example Python Sys Stdout Python Win32 Process Python's Qstandarditemmodel Struct Module in Python Sys Module in Python Tuple in Python Uint8 Python XXhash Python Examples XXhash Python Handling missing keys in Python dictionaries Python Num2words Python Os sep OSError in Python Periodogram in Python Pltpcolor in Python Poolmanager in Python Python pycountry Python pynmea2 Difference between Package and Module in Python How to add 2 lists in Python How to assign values to variables in Python and other languages How to build an Auto Clicker using Python How to check if the dictionary is empty in Python How to check the version of the Python Interpreter How to convert Float to Int in Python How to Convert Int to String in Python How to Define a Function in Python How to Install Pandas in Python How to Plot Graphs Using Python How to Program in Python on Raspberry pi How to Reverse a number in Python How to Sort a String in Python What is Collaborative Filtering in ML, Python What is the Python Global Interpreter Lock

Cross Validation in Sklearn

Data scientists can benefit from cross-validation in machine learning in two key ways: it can assist in minimising the amount of data needed and ensure the artificial intelligence model is reliable enough. Cross-validation accomplishes that at the expense of resource use; thus, it's critical to comprehend how it operates before deciding to use it.

Using the k-fold cross-validation approach, the performance of machine learning models while making predictions on data not used during training is estimated.

This process may be applied to comparing and choosing a model for the dataset and optimising the model's hyperparameters on a dataset. The performance of a model is probably overestimated when it is tweaked and selected using the same cross-validation technique and dataset.

Nesting the hyperparameter optimisation strategy under the model selection procedure is one method for addressing this bias. Comparing and evaluating tuned machine learning models is known as twofold cross-validation or layered cross-validation.

A statistical technique called cross-validation is used to gauge how well machine learning models work. It is a technique for determining if the outcomes of a statistical analysis will transfer to a different data set.

This tutorial will briefly discuss cross-validation's advantages or benefits before demonstrating its detailed use with a wide range of Sklearn's popular Python library techniques.

Let’s understand the benefits of Cross Validation is given below:

1. The first benefit of Cross-validation

Cross-validation is used to split the data, and Normally, we can say that it is the Data size reduction benefit of Cross-validation in Sklearn.

The data can often be divided into three sets: training, testing and validation.
Let’s understand these three steps one by one.

  • Training: The model is trained through training, and its hyperparameters are modified.
  • Testing: Testing ensures that the improved model performs well on untested data and generalises well.
  • Validation: Your choice of parameters during optimisation causes some test set knowledge to leak into the model, necessitating a final check on utterly unreliable data.

Because you can train and test using the same data, adding cross-validation to the workflow helps minimise the need for the validation set.

Note: A subset of the training set is used for testing in the most common cross-validation method. After multiple repetitions, each data point appears once in the test set.

2. The second benefit of Cross-validation

Even while the target variable's distribution is guaranteed to be the same in both the train and test sets, thanks to the stratified split used by Sklearn's train-test split approach, it's still possible to mistakenly train on a subset that isn't representative of the real world.

Consider determining a person's gender based on height and weight. Though if you're extremely unlucky, your train data might only contain dwarf men and towering Amazon women, it seems to reason that taller and heavier folks would prefer to be males. Cross-validation allows you to do several train-test splits, and while one fold may produce incredibly fantastic results, the other might not.

When one splits yields unexpected findings, your data contains an anomaly. Normally, we can say that it is the Robust process benefit of Cross-validation in Sklearn.

Note: If your cross-validation split doesn't produce a comparable score, you may have overlooked some crucial information in the data.

After understanding the benefits of Cross-validation, let’s understand in deep about what cross-validation is :

Cross-validation

A statistical technique called cross-validation is used to gauge how well machine learning models work. It is a technique for determining if the outcomes of a statistical analysis will transfer to a different data set.

A methodological error is learning the parameters of a prediction function and evaluating it on the same data set. A model that simply repeats the labels of the samples it has just seen would score well but be unable to make predictions about data that has not yet been seen. Overfitting is the term for this circumstance. It is customary to reserve a portion of the available data as a test set (X_test, y_test) when conducting a (supervised) machine learning experiment to avoid this problem. It should be noted that the term "experiment" does not just refer to academic purposes because machine learning experiments sometimes begin in commercial contexts as well.

Let’s understand the syntax of Cross-Validation in Sklearn :

Syntax of Cross-Validation in Sklearn:

sklearn.model_selection.cross_validate(estimator, X, y=None, *, groups=None, scoring=None, cv=None, n_jobs=None, verbose=0, fit_params=None, pre_dispatch='2*n_jobs', return_train_score=False, return_estimator=False, error_score=nan)

This is the syntax of Cross-validation in Sklearn.

Let’s understand its parameters one by one :

  • Estimator: The thing you utilise to suit the data.
    'fit' implementation in an estimator object.

  • X: Data must fit. It can be an array or a list, for instance.
    This parameter has an array-like shape in this. Array-like with the shape (n_features, n_samples).

  • y: the measurable variable that supervised learning attempts to predict.
    This parameter has an array-like shape in this.
    Array-like with the following formats: (n_samples,) or (n_samples, n_outputs), defaulting to None.

  • Groups: For the samples that divide the dataset into the train and test sets, utilise group labels. Only applied when a "Group" CV instance is present.
    Array-like with the shape (n_samples,) and the default is None.

  • Scoring: This score parameter has a strategy to assess how well the cross-validated model performed on the test set.
    It can be str, list, callable, tuple, or dict and the default is None.

    If a single score is being calculated, one can use:
  1. A single string (see Defining model evaluation rules with the scoring parameter);
  2. a callable that only returns one value (see Defining your scoring approach using metric functions).

          If a score is representative of several scores, one may use:

  1. a collection or tuple of distinct strings;
  2. a callable that returns a dictionary with the metric names as keys and the metric scores as values;
  3. a dictionary with callables as values and metric names as keys.
  • cv: cross-validation generator or an iterable,int,and default is None.
    It establishes the cross-validation splitting method. Potential inputs for a CV include:  

    Using the standard 5-fold cross-validation, To describe how many folds there are in a (Stratified)KFold, use an int CV divider and An iterable that yields (train, test) splits into indices arrays.

    StratifiedKFold is used for int/None inputs where the estimator is a classifier and y is either binary or multiclass. The fold is used in all other circumstances. The splits will be consistent across calls because these splitters are instantiated with shuffle=False.

NOTE: Changed from 3-fold to 5-fold in version 0.22: cv default value if None.

  • n_jobs: number of concurrently running jobs. Over the cross-validation splits, the estimator is trained, and the score is calculated. Except in a joblib.parallel backend context, none means 1. Using all processors equals -1.
    Int and the default is None.

  • Verbose: The degree of verbosity.
    Int and the default are 0.

  • Fit_params: parameters that should be passed to the estimator's fit procedure.
    dict, and the default is None.

  • pre_dispatch: str or int, the default is ’2*n_jobs’.
    Regulates how many jobs are sent out for execution simultaneously. When more jobs are sent out than CPUs can handle, this amount can be reduced to prevent memory usage spikes. This variable may be:

    The first point is that If none, all positions are instantly generated and spawned. Use this for quick and light jobs to prevent delays from jobs sprouting on demand.

    The second point is that An int indicates the number of jobs created overall.

    The third point is that A str contains an expression that depends on n jobs, such as "2*n jobs."

  • Return_train_score: To include train scores or not. How alternative parameter values affect the overfitting/underfitting trade-off is done by computing training scores. To choose the parameters that produce the best generalisation performance, it is not strictly necessary to compute the scores on the training set because doing so can be computationally expensive.
    If the estimators fitted on each split should be returned.

    Updated in version 0.19.
    Bool and the default is False.

Note: Changed in version 0.21is that False was used as the default value in place of True.

  • Return_estimator: If the estimators fitted on each split should be returned.
    Updated in version 0.20.
    Bool and the default is False.
  • Error_score: To give the score if the estimator fitting is incorrect. If "raise" is selected, the error is signalled. FitFailedWarning is raised if a numeric value is provided.
    Updated in version 0.20.
    numeric or ‘raise’, default=np.nan


Let’s understand its returns:

  • Scores: Arrays of estimator scores for each cross-validation cycle.
    Each scorer's time arrays are returned as a dict of arrays. These are some potential keys for this dict. Let’s explain these, one by one:
  • Test_score
    The score array for each CV split's test results. If the scoring parameter contains numerous scoring metrics, the suffix _score in the test score converts to a specific metric like test_r2 or test_auc.
  • Train_score
    The score array for each cv is split for the train. If the scoring parameter contains numerous scoring metrics, the suffix _score in train score switches to a specific measure like train_r2 or train_auc. This is only available when the return train score argument is set to True.
  • Score_time
    The amount of time needed to score each cv split's estimator on the test set. (Keep in mind that even if return_train_score is set to True, the time spent scoring on the train set is not taken into account.
  • Fit_time
    The amount of time needed to install the estimator on each cv split of the train set.
  • Estimator
    For each cv split, the estimator raises objections. Only when the return_estimator the argument is set to True is available.


How can Cross-validation Address the Overfitting Issue?

During cross-validation, we create numerous micro-train-test splits using our initial training data. To fine-tune your model, use these splits. For instance, we divide the data into k subgroups for the usual k-fold cross-validation. The remaining subset is then used as the test set after the algorithm has been successively trained on k-1 subsets. We may test our model on completely new data in this manner. You can learn about the seven most popular cross-validation approaches in this post, along with their benefits and drawbacks. The code samples for each technique are also included.

The following is a list of Python cross-validation methods:

1. Hold Out Cross-Validation

The entire dataset is randomly divided into a training set and validation set in this cross-validation procedure. As a general rule, 30% of the total dataset is utilised as the validation set, and the remaining 70% is used as the training set.

Advantages of Hold Out Cross-Validation:-

These are the advantages of Hold Out Cross-Validation given below:

The model will only be built once on the training set due to the need to split the dataset into training and validation sets only once, which will speed up execution. It indicates that Hold out cross-validation is carried out promptly.

Disadvantages of Hold Out Cross-Validation:-

These are the disadvantages of Hold Out Cross-Validation given below:

  1. Consider an unbalanced dataset with classes "0" and "1". Let's assume that 80% of the data falls under class "0", and the remaining 20% falls under class "1". Using a train-test split where the train set makes up 80% of the dataset, and the test data makes up 20%, The training set might have 100% of the class "0" data, whereas the test set might contain 100% of the class "1" data. Since our model has never encountered class "1" data before, it will not generalise well to our test data.
  2. If the dataset is tiny, a portion will be saved aside for testing the model because it may contain crucial details that our model might have missed because it wasn't trained on them.

2. K-Fold Cross-Validation

The entire dataset is divided into K equal-sized pieces using the K-Fold cross-validation procedure. Each division is referred to as a "Fold." We refer to it as K-Folds because there are K pieces. The remaining K-1 folds are utilised as the training set, while One Fold is used as a validation set.

Until each fold is employed as a validation set and the remaining folds are the

the training set, the procedure is repeated K times.

The ultimate accuracy of the model is determined using the mean accuracy of the

k-models validation data.

Advantages of K-Fold Cross-Validation:-

These are the advantages of K-Fold Cross-Validation given below:

The full dataset is used to generate a training and validation set.

Disadvantages of Hold Out Cross-Validation:-

These are the disadvantages of K-Fold Cross-Validation given below:

  1. It is possible that all training set samples will only contain samples from class "0" and not any samples from class "1," as was discussed in the case of HoldOut cross-validation. And a sample from class "1" will be included in the validation set.

  2. The order of the samples matters for Time Series data. As opposed to this, samples are chosen at random for K-Fold Cross-Validation.
             

3. Stratified K-Fold Cross-Validation

The improved K-Fold cross-validation method known as stratified K-Fold is typically applied to datasets that are unbalanced. The entire dataset is split into K-folds of equal size, just like K-fold.

However, in this method, each fold will have the same proportion of target variable occurrences as in the entire dataset.

Advantages of Stratified K-Fold Cross-Validation:-

These are the advantages of Stratified K-Fold Cross-Validation given below:

In stratified cross-validation, all classes' data will be represented in each fold in the

same proportion across the entire dataset.

Disadvantages of Stratified K-Fold Cross-Validation:-

These are the disadvantages of Stratified K-Fold Cross-Validation given below:

The order of the samples matters for Time Series data. Stratified Cross-Validation, however, chooses samples in random order.

4. Leave P Out Cross-Validation

A thorough cross-validation technique called LeavePOut cross-validation uses the remaining n-p samples as the training set and the p-samples as the validation set.

Assume that the dataset contains 100 samples. If we choose p=10, then 10 values will be utilised as the validation set for each iteration, while the remaining 90 samples will constitute the training set. This procedure is repeated until the entire dataset has been split into an n-p training sample set and a validation set of p-samples.

Advantages of Leave P Out Cross-Validation:-

These are the advantages of Leave P Out Cross-Validation given below:

Every data sample is utilised as a training sample and a validation sample.

Disadvantages of Leave P Out Cross-Validation:-

These are the disadvantages of Leave P Out Cross-Validation given below:

  1. The preceding method will take more time to compute because it will keep repeating itself until all samples have been used as a validation set.

  2. Similar to K-Fold Cross-validation, our model cannot generalise for the validation set if the training set contains samples from only one class.

5. Leave one Out cross-validation.

The exhaustive cross-validation method known as "LeaveOneOut cross-validation" uses the remaining n-1 samples as the training set and 1 sample point as the validation set. Assume that the dataset contains 100 samples. Following then, one value will be utilised as a validation set for each iteration, while the remaining 99 samples will serve as the training set. As a result, the procedure is repeated until each sample in the dataset has served as a validation point.

With p=1, it is equivalent to LeavePOut cross-validation.

6. Monte Carlo Cross-Validation

A particularly adaptable cross-validation technique is Monte Carlo cross-validation, also referred to as Shuffle Split cross-validation. The datasets are randomly divided into training and validation sets in this method.

We have chosen which portion of the dataset will serve as a training set and which portion will serve as a validation set. The remaining dataset is not used in either the training set or the validation set if the combined percentage of training and validation set size does not equal 100.

Suppose we have 100 samples, of which 70% will be used as a training set and 20% as a validation set. The remaining 10% (100-(70+20)) will not be used.

Advantages of Monte Carlo Cross-Validation:-

These are the advantages of Monte Carlo Cross-Validation given below:

  1. The size of the training and validation sets is up to us.
  2. We are not dependent on the number of folds for repetitions and can choose the number of repetitions.

Disadvantages of Monto Carlo Cross-Validation:-

These are the disadvantages of Monto Carlo Cross-Validation given below:

  1. A small number of samples might not be chosen for the training or validation set.
  2.  Unsuitable for unbalanced datasets: All samples are chosen at random after the size of the training set and the validation set is determined, so it is possible that the training set does not contain the same class of data as the test set. In this case, the model cannot generalise to unobserved data.

7. Time Series Cross-Validation

Data collected over a time period in a series is referred to as a time series. There is a chance for correlation between observations because the data points were collected at close intervals. One characteristic that sets time-series data apart from cross-sectional data is this.

As it makes no sense to use the values from the future data to forecast the values of the past data, we are unable to select random samples and assign them to either the training or validation set when dealing with time-series data.

We divided the data into training and validation sets according to time using the "Forward chaining" approach, also known as rolling cross-validation because the data's order is crucial for time series-related problems.

As the training set, we begin with a tiny subset of the data. We make predictions for subsequent data points using that set, and then we verify the accuracy.

Advantages of Time series Cross-Validation:-

These are the advantages of Time series Cross-Validation given below:

This is among the best techniques.

Disadvantages of Time series Cross-Validation:-

These are the disadvantages of Time series Cross-Validation given below:

Not appropriate for other data kinds' validation: As with previous techniques, random samples are used as the training or validation set, but with this methodology, the sequence of the data is crucial.

Let’s understand an example that is going to explain how cross-validation will work.

Example

# importing sklearn libraries required for training the model
from sklearn.model_selection import train_test_split, cross_validate, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
# importing seaborn and matplotlib.pyplot for visualisation


# iris dataset imported for training and testing the model
from sklearn.datasets import load_iris
# logistic regression model imported
from sklearn.linear_model import LogisticRegression


# loading data set into x and y variables
X, y = load_iris(return_X_y=True)


# scaling of the dataset
sc = StandardScaler()
X = sc.fit_transform(X)




# creating the logistic regression model
log_reg = LogisticRegression()


kf = KFold(n_splits=5)
precision = cross_validate(log_reg, X, y, cv=kf)


print(" The precision of the model for the dataset \n", precision)


Output

The precision of the model for the dataset
{'fit_time': array([0.00797629, 0.00598359, 0.00599241, 0.00598335, 0.00497603]), 'score_time': array([0.        , 0.        , 0.        , 0.00100064, 0.        ]), 'test_score': array([1.        , 1.        , 0.83333333, 0.93333333, 0.73333333])}

Conclusion       

 Many machine learning tasks use the train-test split basic idea. However, think about using cross-validation to solve your problem if you have enough resources. A constant score throughout the many folds would indicate that you have missed an important relationship inside your data, which will not only help you use fewer data.

A number of approaches are available in the Sklearn package to partition the data to meet your AI exercise. You can stratify the data based on the target variable, shuffle the data, or make a simple KFold.