Data Drop in Python
Introduction
You'll understand how to delete a group of rows from a Pandas dataframe in this article.You can read this article on How to Drop Columns in Pandas to find out how to do it. You need to first understand “what is pandas dataframe”. So, let's study the definition of a pandas dataframe.
What is Pandas Dataframe?
Data can be stored in rows and columns using a two-dimensional data structure called a Pandas dataframe. It is quite helpful for examining data.
Depending on the requirements of your goals and your models when examining your analytics, you might need to drop a particular list of rows when you've got a list of the data recorded in a dataframe.
How to Delete a Row or Column from a Dataframe Using Pandas
Use the drop() function provided in the dataframe to delete a row or column from the dataframe. The documentation for the drop() method is available here.
Axis for Dataframe:
- Axis = 0 is used to indicate rows.
- When indicating columns, use axis=1.
Labels for Dataframe
- By default, the index number, which begins with 0, is used to identify rows.
- Names are used to identify the columns.
Parameters for Drop() function:
- index: The list of removed rows
- axis=0: Indicates in the dataframe which rows should be eliminated.
- Instead of establishing a new dataframe object during deletion operation, the inplace = True option performs the drop operation within the existing dataframe.
Pandas DataFrame Example
The columns product name, Unit__Price, Units_No, Avl_Quantity, and Avl_Since_Date are included in our sample dataframe. Rows with NaN values, which are used to indicate missing values, are also present.
import pandas as pd
Data = {"pd_name":["CPU","Mouse", "Monitor","CPU","Keyboard","Speakers", pd.NaT],
"Unit__Price":[5000.235, 10000.550, 500, 10000.550, 200, 250.50, None],
"Units_No":[5, 20, 5, 20, 10, 8,pd.NaT],
"Avl_Quantity":[7, 10,"Not Available", 8,"Not Available", pd.NaT,pd.NaT],
"Avl_Since_Date":['4/23/2021','11/5/2021','09/18/2021','08/21/2021','09/18/2021','01/05/2021', pd.NaT]
}
df = pd.DataFrame(Data)
print(df)
Output:
pd_name Unit__Price Units_No Avl_Quantity Avl_Since_Date
0 CPU 5000.235 5 7 4/23/2021
1 Mouse 10000.550 20 10 11/5/2021
2 Monitor 500.000 5 Not Available 09/18/2021
3 CPU 10000.550 20 8 08/21/2021
4 Keyboard 200.000 10 Not Available 09/18/2021
5 Speakers 250.500 8 NaT 01/05/2021
6 NaT NaN NaT NaT NaT
And with that, we've finished building our sample dataframe.
Following every drop operation, you'll use df to print the dataframe, which will print it in a standard HTML table format.
To print the dataframe in several visual representations, read about how to Perfectly Print a Dataframe here.
After that, you'll discover many use cases for dropping a list of rows.
How to Drop a List of Rows in a Pandas Graph by Index
By providing a list of indices to the drop() function, you can delete a group of entries from a Pandas database.
import pandas as pd
Data = {"pd_name":["CPU","Mouse","Monitor","CPU","Keyboard","Speakers", pd.NaT],
"Unit__Price":[5000.235, 10000.550, 500, 10000.550, 200, 250.50, None],
"Units_No":[5, 20, 5, 20, 10, 8,pd.NaT],
"Avl_Quantity":[7, 10,"Not Available", 8,"Not Available", pd.NaT,pd.NaT],
"Avl_Since_Date":['4/23/2021','11/5/2021','09/18/2021','08/21/2021','09/18/2021','01/05/2021', pd.NaT]
}
df = pd.DataFrame(Data)
df.drop([4,6], axis=0, inplace=True)
print(df)
In this code,
- The index of the rows you would like to remove is [4,6].
- Axis=0 indicates that the dataframe's rows should be erased.
- The drop action is carried out in the same dataframe when inplace=True.
The dataframe will have the following information after removing the rows with indexes 4 and 6:
pd_name Unit__Price Units_No Avl_Quantity Avl_Since_Date
0 CPU 5000.235 5 7 4/23/2021
1 Mouse 10000.550 20 10 11/5/2021
2 Monitor 500.000 5 Not Available 09/18/2021
3 CPU 10000.550 20 8 08/21/2021
5 Speakers 250.500 8 NaT 01/05/2021
This is how you can remove rows that include a particular index.
You'll then discover how to drop a range of indexes.
How to Delete Rows in Pandas by Index Range
Additionally, a list of rows within a particular range can be dropped.
A group of values with a lower limit and an upper limit is referred to as a range.
If you want to construct a sample dataset that excludes particular data ranges, this can be helpful.
The df.index() method of a dataframe can be used to produce a range of rows. The rows can then be deleted by passing this range to the drop() function, as illustrated below.
import pandas as pd
Data = {"pd_name":["CPU","Mouse", "Monitor","CPU","Keyboard","Speakers", pd.NaT],
"Unit__Price":[5000.235, 10000.550, 500, 10000.550, 200, 250.50, None],
"Units_No":[5, 20, 5, 20, 10, 8,pd.NaT],
"Avl_Quantity":[7, 10,"Not Available", 8,"Not Available", pd.NaT,pd.NaT],
"Avl_Since_Date":['4/23/2021','11/5/2021','09/18/2021','08/21/2021','09/18/2021','01/05/2021', pd.NaT]
}
df = pd.DataFrame(Data)
df.drop(df.index[1:3], inplace=True)
print(df)
What this code does is as follows:
- The range of rows produced by df.index[1:3] is from 1 to 3. The range's lower and upper limits are inclusive and exclusive, respectively. Accordingly, rows 1 and 2 will be removed, but row 3 won't be.
- The drop action is carried out in the same dataframe when inplace=True.
You'll have the following information in the dataframe after removing the rows between 1 and 3:
pd_name Unit__Price Units_No Avl_Quantity Avl_Since_Date
0 CPU 5000.235 5 7 4/23/2021
3 CPU 10000.550 20 8 08/21/2021
4 Keyboard 200.000 10 Not Available 09/18/2021
5 Speakers 250.500 8 NaT 01/05/2021
6 NaT NaN NaT NaT NaT
This is how you can use the dataframe's range to delete a list of rows.
How to Remove Every Row Following an Index in Pandas
Using iloc[], you can delete all rows that follow a particular index.
Iloc[] allows you to choose rows by utilizing their location index. By using a colon (:), you can specify the start and finish positions. To choose rows from 2 to 3, for instance, use the formula 2:3. You only need to type in iloc[] to select every row.
When you want to divide the dataset for training and testing purposes, this could be handy.
Use the snippet below to choose rows from index 0 to index 2. As a result, the rows following index 2 are dropped.
import pandas as pd
Data = {"pd_name":["CPU","Mouse", "Monitor","CPU","Keyboard","Speakers", pd.NaT],
"Unit__Price":[5000.235, 10000.550, 500, 10000.550, 200, 250.50, None],
"Units_No":[5, 20, 5, 20, 10, 8,pd.NaT],
"Avl_Quantity":[7, 10,"Not Available", 8,"Not Available", pd.NaT,pd.NaT],
"Avl_Since_Date":['4/23/2021','11/5/2021','09/18/2021','08/21/2021','09/18/2021','01/05/2021', pd.NaT]
}
df = pd.DataFrame(Data)
df = df.iloc[:2]
print(df)
The number 2 in this code chooses all rows up to index 2.
This is how you may remove every row that comes after a certain index.
The data in the dataframe will look like this once the rows following index 2 have been removed:
pd_name Unit__Price Units_No Avl_Quantity Avl_Since_Date
0 CPU 5000.235 5 7 4/23/2021
1 Mouse 10000.550 20 10 11/5/2021
Rows after a particular index may be removed in this manner.
You'll then discover how to remove rows with conditions.
How to Drop Rows in Pandas with Multiple Conditions
The dataframe's rows can be deleted based on predetermined criteria.
Rows with column values larger than X and smaller than Y, for instance, can be dropped.
When you want to generate a dataset that avoids columns with particular values, this might be helpful.
Select the index of the rows that satisfy the required requirement and send that index to the drop() function to remove the rows based on the specified conditions.
import pandas as pd
Data = {"pd_name":["CPU","Mouse", "Monitor","CPU","Keyboard","Speakers", pd.NaT],
"Unit__Price":[5000.235, 10000.550, 500, 10000.550, 200, 250.50, None],
"Units_No":[5, 20, 5, 20, 10, 8,pd.NaT],
"Avl_Quantity":[7, 10,"Not Available", 8,"Not Available", pd.NaT,pd.NaT],
"Avl_Since_Date":['4/23/2021','11/5/2021','09/18/2021','08/21/2021','09/18/2021','01/05/2021', pd.NaT]
}
df = pd.DataFrame(Data)
df.drop(df[(df['Unit__Price'] >400) & (df['Unit__Price'] < 600)].index, inplace=True)
print(df)
The code here
- The criteria to remove the rows are (df ['Unit Price'] > 400) and (df ['Unit Price'] < 600).
- The index of rows that satisfy the criterion is chosen by df[].index.
- Instead of generating a new dataframe, inplace=True conducts the drop operation within the current dataframe.
The following information will be in the dataframe after the rows with the condition that the unit price larger than 400 and less than 600 have been removed:
pd_name Unit__Price Units_No Avl_Quantity Avl_Since_Date
0 CPU 5000.235 5 7 4/23/2021
1 Mouse 10000.550 20 10 11/5/2021
3 CPU 10000.550 20 8 08/21/2021
4 Keyboard 200.000 10 Not Available 09/18/2021
6 NaT NaN NaT NaT NaT
This is how you can use specific conditions to remove rows from the dataframe.
Conclusion
In conclusion, you now understand what the drop() function in a Pandas dataframe does. Additionally, you have observed the labeling of the dataframe's rows and columns. Finally, you have learned how to drop rows depending on conditions, a range of indices, and indices.