7 minute read

In this article, we will see how to drop rows of a Pandas dataframe based on conditions. Additionally, we will also discuss on how to drop by index, by conditions based on a list, and by NaN values.

We will be following these steps in this article to drop rows in a dataframe based on conditions

  • Create a test dataframe
  • Drop rows by filtering the dataframe using boolean indexing
  • Drop rows using pandas drop function

  • drop rows by condition in list
  • drop rows by Index and NaN values

Create dataframe

We will create a 15 row dataframe with student details. It has four columns - Name, Grade, Score, and Major. The dataframe column has got some NaN values too.

# importing required libraries
import pandas as pd
import numpy as np

# creating dataframe
df = pd.DataFrame(
    {
        'Name':['Alex', 'David', 'Emily', 'Olivia', 'Jack', 'Kathy', 'Eric',
                'Paul', 'Mary', 'Joe', 'Jane', 'Abel', 'Sean', 'Tina', 'Laura'],
        'Grade':[8, 10, 9, 12, 11, 10, 8, 12, 11, 11, 10, 9, 9, 12, 11],
        'Score':[80, 90, 60, 70, 50, 80, 30, 90, np.nan, 10, 40, 20, 90, 70, 60],
        'Major':['Math', 'English', 'Math', 'History', 'Science', np.nan, 'History', 'Math',
                 'History', 'Math', 'English', 'Science', 'Science', 'English', 'History']
    })

# displaying the dataframe
df

This is how our school dataframe looks like:

  Name Grade Score Major
0 Alex 8 80.0 Math
1 David 10 90.0 English
2 Emily 9 60.0 Math
3 Olivia 12 70.0 History
4 Jack 11 50.0 Science
5 Kathy 10 80.0 NaN
6 Eric 8 30.0 History
7 Paul 12 90.0 Math
8 Mary 11 NaN History
9 Joe 11 10.0 Math
10 Jane 10 40.0 English
11 Abel 9 20.0 Science
12 Sean 9 90.0 Science
13 Tina 12 70.0 English
14 Laura 11 60.0 History

Pandas drop rows by condition

We will see how to drop rows with conditions in this section. One way to achieve this is by passing integer value the other is by passing a string value. Conditions are passed with comparison and bitwise operators.

The simplest way to drop a row is by using boolean indexing and filter out the rows that doesn’t met certain conditions.

  • Drop all rows where score of the student is less than 80
df = df[df['Score'] > 80]

Here, we filtered the rows by integer value of 80 and above, other rows have been dropped.

  Name Grade Score Major
1 David 10 90.0 English
7 Paul 12 90.0 Math
12 Sean 9 90.0 Science
  • Use pandas drop function to drop the rows and df.loc will get the index of all the rows matching the condition
df.drop(df.loc[df['Score'] < 80].index, axis = 0, inplace = True)
  • Instead of Boolean Indexing you could also use query to get the index of all the rows matching conditions
df.drop(df.query(" Score < 80 ").index, axis = 0, inplace = True)

In this, we have filtered rows which are 80 and above. Any row less than 80 is dropped.

  Name Grade Score Major
0 Alex 8 80.0 Math
1 David 10 90.0 English
5 Kathy 10 80.0 NaN
7 Paul 12 90.0 Math
8 Mary 11 NaN History
12 Sean 9 90.0 Science

Pandas drop rows by multiple conditions

In this section, we will see how to drop rows by multiple column values. Having multiple condition filters out the dataframe to the data that is required to run a desired operation.

  • Filter the rows where grade of student is either 12 or Major is not Math using boolean indexing
df = df[(df.Grade == 12) | (df.Major != 'Math')]
  Name Grade Score Major
1 David 10 90.0 English
3 Olivia 12 70.0 History
4 Jack 11 50.0 Science
5 Kathy 10 80.0 NaN
6 Eric 8 30.0 History
7 Paul 12 90.0 Math
8 Mary 11 NaN History
10 Jane 10 40.0 English
11 Abel 9 20.0 Science
12 Sean 9 90.0 Science
13 Tina 12 70.0 English
14 Laura 11 60.0 History
  • Another example of dropping the rows using the drop function
df.drop(df.loc[(df['Grade'] == 12) | (df['Major'] == 'Math') | (df['Score'] > 40)].index, axis = 0, inplace = True)
  • You could also use query to get the index of the rows

In this we will use multiple conditions to drop specific rows.

df.drop(df.query(" Grade == 12 | Major == 'Math' | Score > 40 ").index, axis = 0, inplace = True)

Both functions give the same output. Rows with 12th Grade Math major with the Score greater than 40 have been dropped.

  Name Grade Score Major
6 Eric 8 30.0 History
8 Mary 11 NaN History
10 Jane 10 40.0 English
11 Abel 9 20.0 Science

Drop rows by condition in list

In this, we create a list of values that has to be dropped. There are two ways to achieve this. We will be using isin() and lambda to achieve this.

  • Drop rows by list of Index

Here, we are passing a list of index numbers to be dropped.

# creating list of index numbers to dropped
x = [1, 3, 5, 7, 9, 11, 13]

df = df.drop(x)
  Name Grade Score Major
0 Alex 8 80.0 Math
2 Emily 9 60.0 Math
4 Jack 11 50.0 Science
6 Eric 8 30.0 History
8 Mary 11 NaN History
10 Jane 10 40.0 English
12 Sean 9 90.0 Science
14 Laura 11 60.0 History
  • Drop rows if column values match certain list of values

To drop rows by conditions in a list we have to use isin(). By using this function we can drop rows by their column values. This is the easiest way to drop if you know specifically which values needs to be dropped. ~ (tilde) character is the NOT operator for this function.

# creating list to drop by column values
x = [8, 9, 10]

# drop rows by isin the list
df.drop(df[df.Grade.isin(x)].index, axis = 0, inplace = True)
  • Use lambda to match the column values in a list
# creating list to drop by column values
x = [8, 9, 10]

df = df.apply(lambda row: row[~df['Grade'].isin(x)])

Both give the same output. Rows with the list of Grades have been dropped.

  Name Grade Score Major
3 Olivia 12 70.0 History
4 Jack 11 50.0 Science
7 Paul 12 90.0 Math
8 Mary 11 NaN History
9 Joe 11 10.0 Math
13 Tina 12 70.0 English
14 Laura 11 60.0 History

Drop rows by index

drop() will drops the rows by the index number.

drop rows by index number

df = df.drop(index = 1)

df = df.drop(1)

Both output with index[1] row dropped. We displaying the head of the dataframe.

  Name Grade Score Major
0 Alex 8 80.0 Math
2 Emily 9 60.0 Math
3 Olivia 12 70.0 History
4 Jack 11 50.0 Science
5 Kathy 10 80.0 NaN

Another way to drop row by index. It outputs the same as above.

df = df.drop(df.index[1])

drop rows by index - range

In this, we drop rows by giving the function a range. It drops all the rows mentioned in the index range. The last value is n-1.

df = df.drop(df.index[range(0, 9)])

drop rows by index - slice

We can drop rows by slice operation too. This is how it is done. This drops the rows from the index to index n- 1.

df = df.drop(df.index[0:9])

Both give the same output.

  Name Grade Score Major
9 Joe 11 10.0 Math
10 Jane 10 40.0 English
11 Abel 9 20.0 Science
12 Sean 9 90.0 Science
13 Tina 12 70.0 English
14 Laura 11 60.0 History

Drop rows with NaN values using dropna()

dropna() is the most efficient function when it comes to drop NaN values in a dataframe. It drops all the NaN values from the dataframe/data based on conditions and axis. We will explore dropna() in this section.

DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False)

Parameters:
axis: {0 or ‘index’, 1 or ‘columns’}, default 0
how: {‘any’, ‘all’}, default ‘any’
thresh: int, optional
subset: column label or sequence of labels, optional
inplace: bool, default False

df = df.dropna()

Simply doing dropna() without any argument deletes all NaN values from the rows. The default axis is set at 0 that is, the rows.

  Name Grade Score Major
0 Alex 8 80.0 Math
1 David 10 90.0 English
2 Emily 9 60.0 Math
3 Olivia 12 70.0 History
4 Jack 11 50.0 Science
6 Eric 8 30.0 History
7 Paul 12 90.0 Math
9 Joe 11 10.0 Math
10 Jane 10 40.0 English
11 Abel 9 20.0 Science
12 Sean 9 90.0 Science
13 Tina 12 70.0 English
14 Laura 11 60.0 History

dropna(how)

how: {‘any’, ‘all’}, default ‘any’
‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.

dropna() has how parameter and the default is set to any. The other value is all. When how = 'all' it drops rows and columns which has all the NaN values across the dataframe. This is helpful when you are working on the data with NaN column values.

df = df.dropna(how = 'any')

df = df.dropna(how = 'all')

dropna(subset)

subset lets you define the columns in which you are looking for missing values.

df = df.dropna(subset = ['Name', 'Grade'])

Since, Name and Grade columns does not have any NaN value, this will return us the whole dataframe - with the NaN values from the other columns.

dropna(thresh)

thresh keeps only the rows with at least 1 non-NA value(s).

df.dropna(thresh = 1)

Tags: ,

Categories: ,

Updated: