The iris dataset is a toy dataset provided by many libraries of Python, such as sci-kit-learn, and it is often used in machine learning and data science because of its simple understanding and well-defined. The iris dataset contains the species of three flowers: Setosa, Versicolor, or Virginica.

There are 150 samples in the iris dataset of three species of iris. There are 4 columns in the iris dataset: the first column shows sepal length, the second column shows the sepal width, the third column shows petal length, the fourth column shows the petal width, and one more column is a target column shows the class of each flower base on the four properties of the flower.

The three species of iris look similar, but the difference is the measurements can be used to classify them. This data set is an example of supervised learning (that contains labels). The input variables are sepal length and width and petal length and petal width; each row shows an input variable or observation. The output variables in the iris dataset are Iris-Setosa, Iris-versicolor, or iris-virginica. Each column shows a class label.

Iris Data Set in Python:

To see the iris dataset in Python, first import the dataset from the scikit-learn library or you can download the structured iris dataset from various platforms like Kaggle.

Code:

from sklearn.datasets import load_iris

iris = load_iris()

iris.keys()

Output:

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

The dataset consists of some keys, which can be used for accessing specific data. Let;s say you want to get the data about the length and width of the flower using iris['data'].

Code:

iris.data

Output:

array([[5.1, 3.5, 1.4, 0.2],      

 [4.9, 3. , 1.4, 0.2],       [4.7, 3.2, 1.3, 0.2],       [4.6, 3.1, 1.5, 0.2],       [5. , 3.6, 1.4, 0.2],       [5.4, 3.9, 1.7, 0.4],       [4.6, 3.4, 1.4, 0.3],       [5. , 3.4, 1.5, 0.2],       [4.4, 2.9, 1.4, 0.2],       [4.9, 3.1, 1.5, 0.1],       [5.4, 3.7, 1.5, 0.2],       [4.8, 3.4, 1.6, 0.2],       [4.8, 3. , 1.4, 0.1],       [4.3, 3. , 1.1, 0.1],       [5.8, 4. , 1.2, 0.2],       [5.7, 4.4, 1.5, 0.4],       [5.4, 3.9, 1.3, 0.4],       [5.1, 3.5, 1.4, 0.3],       [5.7, 3.8, 1.7, 0.3],       [5.1, 3.8, 1.5, 0.3],       [5.4, 3.4, 1.7, 0.2],       [5.1, 3.7, 1.5, 0.4],       [4.6, 3.6, 1. , 0.2],       [5.1, 3.3, 1.7, 0.5],       [4.8, 3.4, 1.9, 0.2],       [5. , 3. , 1.6, 0.2],       [5. , 3.4, 1.6, 0.4],       [5.2, 3.5, 1.5, 0.2],       [5.2, 3.4, 1.4, 0.2],       [4.7, 3.2, 1.6, 0.2],       [4.8, 3.1, 1.6, 0.2],       [5.4, 3.4, 1.5, 0.4],       [5.2, 4.1, 1.5, 0.1],       [5.5, 4.2, 1.4, 0.2],       [4.9, 3.1, 1.5, 0.2],       [5. , 3.2, 1.2, 0.2],       [5.5, 3.5, 1.3, 0.2],       [4.9, 3.6, 1.4, 0.1],       [4.4, 3. , 1.3, 0.2],       [5.1, 3.4, 1.5, 0.2],       [5. , 3.5, 1.3, 0.3],       [4.5, 2.3, 1.3, 0.3],       [4.4, 3.2, 1.3, 0.2],       [5. , 3.5, 1.6, 0.6],       [5.1, 3.8, 1.9, 0.4],       [4.8, 3. , 1.4, 0.3],       [5.1, 3.8, 1.6, 0.2],       [4.6, 3.2, 1.4, 0.2],       [5.3, 3.7, 1.5, 0.2],       [5. , 3.3, 1.4, 0.2],       [7. , 3.2, 4.7, 1.4],       [6.4, 3.2, 4.5, 1.5],       [6.9, 3.1, 4.9, 1.5],       [5.5, 2.3, 4. , 1.3],       [6.5, 2.8, 4.6, 1.5],       [5.7, 2.8, 4.5, 1.3],       [6.3, 3.3, 4.7, 1.6],       [4.9, 2.4, 3.3, 1. ],       [6.6, 2.9, 4.6, 1.3],       [5.2, 2.7, 3.9, 1.4],       [5. , 2. , 3.5, 1. ],       [5.9, 3. , 4.2, 1.5],       [6. , 2.2, 4. , 1. ],       [6.1, 2.9, 4.7, 1.4],       [5.6, 2.9, 3.6, 1.3],       [6.7, 3.1, 4.4, 1.4],       [5.6, 3. , 4.5, 1.5],       [5.8, 2.7, 4.1, 1. ],       [6.2, 2.2, 4.5, 1.5],       [5.6, 2.5, 3.9, 1.1],       [5.9, 3.2, 4.8, 1.8],       [6.1, 2.8, 4. , 1.3],       [6.3, 2.5, 4.9, 1.5],       [6.1, 2.8, 4.7, 1.2],       [6.4, 2.9, 4.3, 1.3],       [6.6, 3. , 4.4, 1.4],       [6.8, 2.8, 4.8, 1.4],       [6.7, 3. , 5. , 1.7],       [6. , 2.9, 4.5, 1.5],       [5.7, 2.6, 3.5, 1. ],       [5.5, 2.4, 3.8, 1.1],       [5.5, 2.4, 3.7, 1. ],       [5.8, 2.7, 3.9, 1.2],       [6. , 2.7, 5.1, 1.6],       [5.4, 3. , 4.5, 1.5],       [6. , 3.4, 4.5, 1.6],       [6.7, 3.1, 4.7, 1.5],       [6.3, 2.3, 4.4, 1.3],       [5.6, 3. , 4.1, 1.3],       [5.5, 2.5, 4. , 1.3],       [5.5, 2.6, 4.4, 1.2],       [6.1, 3. , 4.6, 1.4],       [5.8, 2.6, 4. , 1.2],       [5. , 2.3, 3.3, 1. ],       [5.6, 2.7, 4.2, 1.3],       [5.7, 3. , 4.2, 1.2],       [5.7, 2.9, 4.2, 1.3],       [6.2, 2.9, 4.3, 1.3],       [5.1, 2.5, 3. , 1.1],       [5.7, 2.8, 4.1, 1.3],       [6.3, 3.3, 6. , 2.5],       [5.8, 2.7, 5.1, 1.9],       [7.1, 3. , 5.9, 2.1],       [6.3, 2.9, 5.6, 1.8],       [6.5, 3. , 5.8, 2.2],       [7.6, 3. , 6.6, 2.1],       [4.9, 2.5, 4.5, 1.7],       [7.3, 2.9, 6.3, 1.8],       [6.7, 2.5, 5.8, 1.8],       [7.2, 3.6, 6.1, 2.5],       [6.5, 3.2, 5.1, 2. ],       [6.4, 2.7, 5.3, 1.9],       [6.8, 3. , 5.5, 2.1],       [5.7, 2.5, 5. , 2. ],       [5.8, 2.8, 5.1, 2.4],       [6.4, 3.2, 5.3, 2.3],       [6.5, 3. , 5.5, 1.8],       [7.7, 3.8, 6.7, 2.2],       [7.7, 2.6, 6.9, 2.3],       [6. , 2.2, 5. , 1.5],       [6.9, 3.2, 5.7, 2.3],       [5.6, 2.8, 4.9, 2. ],       [7.7, 2.8, 6.7, 2. ],       [6.3, 2.7, 4.9, 1.8],       [6.7, 3.3, 5.7, 2.1],       [7.2, 3.2, 6. , 1.8],       [6.2, 2.8, 4.8, 1.8],       [6.1, 3. , 4.9, 1.8],       [6.4, 2.8, 5.6, 2.1],       [7.2, 3. , 5.8, 1.6],       [7.4, 2.8, 6.1, 1.9],       [7.9, 3.8, 6.4, 2. ],       [6.4, 2.8, 5.6, 2.2],       [6.3, 2.8, 5.1, 1.5],       [6.1, 2.6, 5.6, 1.4],       [7.7, 3. , 6.1, 2.3],       [6.3, 3.4, 5.6, 2.4],       [6.4, 3.1, 5.5, 1.8],       [6. , 3. , 4.8, 1.8],       [6.9, 3.1, 5.4, 2.1],       [6.7, 3.1, 5.6, 2.4],       [6.9, 3.1, 5.1, 2.3],       [5.8, 2.7, 5.1, 1.9],       [6.8, 3.2, 5.9, 2.3],       [6.7, 3.3, 5.7, 2.5],       [6.7, 3. , 5.2, 2.3],       [6.3, 2.5, 5. , 1.9],       [6.5, 3. , 5.2, 2. ],       [6.2, 3.4, 5.4, 2.3],       [5.9, 3. , 5.1, 1.8]])

To see the target in the iris data, use

Code:

iris.target

Output:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Converting the Dataset into a Pandas Dataframe:

To convert the iris data into a dataframe, make the data frame and add the iris.data as the instances, and the target is the target variable

Code:

import pandas as pd

from sklearn.datasets import load_iris

iris = load_iris()

df = pd.DataFrame(iris.data, columns = iris.feature_names)

df['target'] = iris.target

df.head(10)

Output:

sepal length (cm)   sepal width (cm)    petal length (cm)   petal width (cm)    target

0       5.1                  3.5                 1.4                  0.2            0.0

1       4.9                  3.0                 1.4                  0.2            0.0

2       4.7                  3.2                 1.3                  0.2            0.0

3       4.6                  3.1                 1.5                  0.2            0.0

4       5.0                  3.6                 1.4                  0.2            0.0

5       5.4                  3.9                 1.7                  0.4            0.0

6       4.6                  3.4                 1.4                  0.3            0.0

7       5.0                  3.4                 1.5                  0.2            0.0

8       4.4                  2.9                 1.4                  0.2            0.0

9       4.9                  3.1                 1.5                  0.1            0.0

Here the data frame contains the length and width of sepals and petals consisting of the target column, which is the numerical representation of classes of Iris Flowers; for target 0, the class is Sentosa; for target 1, the class is Versicolor; for target 2, the class is Virginia.

Let;s add the names of species in the data frame by adding one more column with the names of different species.

Code:

species = []

for i in range(len(iris['target'])):

    if iris['target'][i] == 0:

        species.append("setosa")

    elif iris['target'][i] == 1:

        species.append('versicolor')

    else:

        species.append('virginica')

df['species'] = species

df.head()

Output:

sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  target    species

0                              5.1                          3.5                          1.4                          0.2        0               setosa

1                              4.9                          3.0                          1.4                          0.2        0               setosa

2                              4.7                          3.2                          1.3                          0.2        0               setosa

3                              4.6                          3.1                          1.5                          0.2        0               setosa

4                              5.0                          3.6                          1.4                          0.2        0               setosa

5                              5.4                          3.9                          1.7                          0.4        0               setosa

6                              4.6                          3.4                          1.4                          0.3        0               setosa

7                              5.0                          3.4                          1.5                          0.2        0               setosa

8                              4.4                          2.9                          1.4                          0.2        0               setosa

9                              4.9                          3.1                          1.5                          0.1        0               setosa

To know the size of each species in the iris data set, use

Code:

df.groupby('species').size()

Output:

species

setosa        50

versicolor    50

virginica     50

dtype: int64

This shows that there are 150 values, out of which 50 are setosa, 50 are versicolor, and 50 are virginica.

To know the statistical representation of the iris data, use

Code:

df.describe()

Output:

sepal  length (cm)                     sepal width (cm)                           petal length (cm)     petal width (cm)                         target

count               150.000000       150.000000       150.000000       150.000000    150.000000

mean               5.843333           3.057333            3.758000           1.199333        1.000000

std                    0.828066           0.435866            1.765298           0.762238        0.819232

min                  4.300000           2.000000            1.000000           0.100000        0.000000

25%                 5.100000           2.800000            1.600000           0.300000        0.000000

50%                 5.800000           3.000000            4.350000           1.300000        1.000000

75%                 6.400000           3.300000            5.100000           1.800000        2.000000

max                 7.900000           4.400000            6.900000           2.500000        2.000000

The described method gives the following statistical measure: count, which tells us about the count of each feature, which tells us about the mean of data in each variable, std means standard deviation, min means minimum value in each attribute of the data set, 25% is the 1^st quartile of the iris data, 50% median, 75% means the 3^rd quartile and max is the maximum value in the data.

Plotting the Dataset:

It is a great way to see and visualize any data, which makes it easy to understand. We can plot the iris data with the help of the matplotlib library.

Code:

import matplotlib.pyplot as plt

setosa = df[df.species == "setosa"]

versicolor = df[df.species=='versicolor']

virginica = df[df.species=='virginica']

fig, ax = plt.subplots()

fig.set_size_inches(13, 7) # adjusting the length and width of plot

# lables and scatter points

ax.scatter(setosa['petal length (cm)'], setosa['petal width (cm)'], label="Setosa", facecolor="blue")

ax.scatter(versicolor['petal length (cm)'], versicolor['petal width (cm)'], label="Versicolor", facecolor="green")

ax.scatter(virginica['petal length (cm)'], virginica['petal width (cm)'], label="Virginica", facecolor="red")

ax.set_xlabel("petal length (cm)")

ax.set_ylabel("petal width (cm)")

ax.grid()

ax.set_title("Iris petals")

ax.legend()

plt.show()

Output:

Python Tutorial

Python Conditional Statements

Python Loops

Python Arrays

Python Strings

Python Built-in Data Structure

Python Functions

Python File Handling

Python Exception Handling

Python OOPs Concept

Python Iterators

Python Generators

Python Decorators

Python Functions and Methods

Python Modules

Python MySQL

Python MongoDB

Python SQLite

Python Data Structure Implementation

Python Advance Topics

Python 2

Python 3

How to

Sorting

Programs

Questions

Differences

Python Kivy

Python Tkinter

Python PyQt5

Misc

Iris Dataset in Python