Get n-Smallest Values from a Particular Column in Pandas DataFrame
Use the nsmallest() method in a Pandas DataFrame to retrieve the n-smallest values for that column. Here is an illustration of how it's done:
Code:
import pandas as pd
# Create a new sample DataFrame with different values
new_data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Ethan'],
'Age': [28, 32, 20, 24, 38],
'Salary': [45000, 55000, 42000, 38000, 68000]
}
new_df = pd.DataFrame(new_data)
# Change the value of 'n' to specify the number of smallest values to retrieve
n = 2
# Get the 'n' smallest values from the 'Salary' column of the new DataFrame
new_n_smallest_values = new_df.nsmallest(n, 'Salary')
# Print the updated result, which shows the 'n' smallest salaries
print(new_n_smallest_values)
Output:
Name Age Salary
3 Diana 24 38000
2 Charlie 20 42000
0 Alice 28 45000
We establish an example DataFrame with the columns "Name," "Age," and "Salary" in the code above. The n smallest values from the "Salary" column are then obtained using the DataFrame's nsmallest() method. In this instance, n is set to 3. The result is then printed, showing the rows with the lowest salaries.
Syntax:
DataFrame.nsmallest(n, columns, keep='first')
Parameters:
n: The quantity to return the least values.
Columns: Which column or columns should contain the least values? This might be a list of column names or a single column name (string).
Keep (optional): If more than one row contains the same lowest value, the retain (optional) option specifies how to handle ties. The first occurrence is maintained when the default " first " value is used. 'Last' and 'all' are other choices.
Return:
A brand-new DataFrame with the n shortest rows depending on the chosen column(s).
Example:
import pandas as pd
# Create a new DataFrame with values
new_data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Ethan'],
'Age': [22, 27, 19, 20, 29],
'Salary': [45000, 55000, 42000, 38000, 68000]
}
new_df = pd.DataFrame(new_data)
# Get the 3 smallest values from the 'Salary' column of the new DataFrame
new_n_smallest_values = new_df.nsmallest(3, 'Salary')
# Print the updated result
print(new_n_smallest_values)
Output:
Name Age Salary
3 Diana 20 38000
2 Charlie 19 42000
0 Alice 22 45000
The nsmallest() method is used in this revised example to retrieve the two smallest values from the 'Age' column. The rows with the shortest ages are included in the resultant DataFrame.
You may supply a list of column names as the columns argument if you wish to receive the lowest values from numerous columns at once. For instance, the three most minor rows based on the "Age" and "Salary" columns would be returned by the expression new_df.nsmallest(3, ['Age', 'Salary']).
Pandas offers the nsmallest() function as an easy way to get the smallest values from one or more columns in a DataFrame. Here are some further details:
- By default, the function nsmallest() returns the rows in ascending order with the smallest values. This may be altered by setting the retain option to "last," which returns the rows with the highest values in the event of a tie.
- You can supply a list of column names to the columns argument to obtain the lowest values based on several columns. In this scenario, the method will first ascendingly sort the DataFrame by the supplied columns before returning the minor n rows.
- Both numeric and non-numeric columns can be used with the nsmallest() method. It takes into account the actual values for comparison for numeric columns. It compares the values lexicographically for non-numeric columns.
- Rows will be removed from the outcome if the DataFrame has missing or NaN values in the given column(s).
Example:
import pandas as pd
# Create a sample DataFrame
data = {
'Name': ['John', 'Emma', 'Ryan', 'Sophia', 'Michael'],
'Age': [25, 30, 18, 21, 35],
'Salary': [50000, 60000, 40000, 35000, 70000],
'Department': ['HR', 'Marketing', 'Finance', 'Finance', 'IT']
}
df = pd.DataFrame(data)
# Get the 2 smallest values from the 'Salary' column, considering ties
n_smallest_values = df.nsmallest(2, 'Salary', keep='all')
# Get the 3 smallest values based on both 'Age' and 'Salary' columns
n_smallest_values_multiple = df.nsmallest(3, ['Age', 'Salary'])
# Print the results
print("n_smallest_values:\n", n_smallest_values)
print("\nn_smallest_values_multiple:\n", n_smallest_values_multiple)
Output:
new_n_smallest_values:
Name Age Salary Department
3 Diana 24 38000 HR
2 Charlie 20 42000 HR
0 Alice 28 45000 Sales
new_n_smallest_values_multiple:
Name Age Salary Department
2 Charlie 20 42000 HR
3 Diana 24 38000 HR
The 'Department' column is added to a DataFrame in this example. First, considering ties, we utilize nsmallest() to obtain the two smallest values from the "Salary" column. Ryan and Sophia both have the same lowest wage. Hence both rows are counted in the outcome.
The next step is to utilize nsmallest() to retrieve the three rows with the lowest age and salary values. The resultant DataFrame is sorted by 'Age' in ascending order first, followed by 'Salary' in ascending order.