Pandas search a string in dataframe across all columns

3 minute read

In this article we are going to learn how to search a string in whole dataframe across multiple columns

we will be just following these steps in order to filter out the rows where the substring contains in any cell value of dataframe in any column:

Create a dataframe(df)
Use df.apply() to apply string search along an axis of the dataframe and returns the matching rows
Use df.applymap() to apply string search to a Dataframe elementwise and returns the matching rows
Index of all matching cells using numpy.argwhere()

Let’s get started

Create a dataframe

Let’s create a test dataframe first, it contains two text columns(zodiac, month) and one numeric column(values).

I am intentionally not building a dataframe with all text columns, so it could be more practical to see how we can search for a string across all columns of dataframe where one or more columns are not of type string

import pandas as pd
import numpy as np


df = pd.DataFrame({'Zodiac' : ['Libra','Capricorn','Scorpio','January','Scorpio', 'Capricorn','Capricorn', 'Taurus'],
               'Month' : ['January', 'February', 'February', 'April', 'January', 'March', 'April', 'April'],
               'values' : [85, 96, 55, 64, 23, 45,89, 91]
                })

zodiac	month	values
Libra	January	85
Capricorn	February	96
Scorpio	February	55
January	April	64
Scorpio	January	23
Capricorn	March	45
Capricorn	April	89
Taurus	April	19

Now, we will look at the dataframe functions that can be used to search for string across all the columns in a dataframe

We will search for a substring ‘jan’ in the dataframe and filter out the rows where this substring exists in any column

Using Apply to search for substring in all dataframe columns

Let’s use the apply function, which is used for applying a function along an axis of the DataFrame.

we are using a lambda function to iterate over the rows of dataframe and search for substring=’jan’ using str.contains() function by casting all columns to dtype as str.

substring = 'jan'
df[df.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1)]

So, it returns 3 rows where the substring=’jan’ exists in any of the 3 columns of dataframe

zodiac	month	values
Libra	January	85
January	April	64
Scorpio	January	23

Using Applymap to check for substring in all dataframe columns

Next, we will use applymap function, that applies a function to a dataframe elementwise. This func returns a single value as the passed in function output

It’s first parameter is a lambda function that search for substring in all column values where instance type is str and this returns a boolean value True if substring exists otherwise False

substring = 'jan'
mask = df.applymap(lambda x: substring in x.lower() if isinstance(x,str) else False).to_numpy()
mask          

Out:

array([[False,  True, False],
       [False, False, False],
       [False, False, False],
       [ True, False, False],
       [False,  True, False],
       [False, False, False],
       [False, False, False],
       [False, False, False]])

We will use this output to pass it thru a pandas dataframe.loc function that Access a group of rows and columns by a boolean array to filter out all the rows where substring exists

df.loc[mask]         

zodiac	month	values
Libra	January	85
January	April	64
Scorpio	January	23

Find index of all matching cells

So far we have seen how to filter out the rows where the substring exists in any of the columns.

Now we will see how to find the index of all the cells in dataframe(df) where the substring exists

We will use the boolean output of the applymap function above and convert that to a numpy array and pass it to the numpy.argwhere() function that find the indices of array elements that are non-zero

substring = 'jan'
mask = df.applymap(lambda x: substring in x.lower() if isinstance(x,str) else False).to_numpy()
indices = np.argwhere(mask) 

The output is a 2D array with matching indices from the dataframe(df) where the substring=’jan’ exists

Out:

array([[0, 1],
       [3, 0],
       [4, 1]])

Conclusion:

Here is a brief summary of this article to highlight the critical features that we learn to search for a string in a dataframe across all the columns:

Apply function can be used with a lambda function to search for all rows in the dataframe where substring cotains in the text
Applymap function could be used with a function that returns a single value to output a boolean mask array that signfies the matching cells in a dataframe
To find the indices of all the matching cells in dataframe we can use the boolean mask output and pass it in numpy.argwhere to get non-zero(True) indices

Share on

Twitter Facebook LinkedIn

Pandas search a string in dataframe across all columns

Create a dataframe

Using Apply to search for substring in all dataframe columns

Using Applymap to check for substring in all dataframe columns

Find index of all matching cells

Share on

You may also enjoy

pandas count duplicate rows

Pandas value error while merging two dataframes with different data types

How to get True Positive, False Positive, True Negative and False Negative from confusion matrix in scikit learn

Pandas how to use list of values to select rows from a dataframe