Python – How to drop rows from pandas data frame that contains a particular string in a particular column?


I have a very large data frame in python and I want to drop all rows that have a particular string inside a particular column.

For example, I want to drop all rows which have the string "XYZ" as a substring in the column C of the data frame.

Can this be implemented in an efficient way using .drop() method?

Best Answer

pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:

In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))

In [92]: df
   A          C
0  5        foo
1  3        bar
2  5  fooXYZbar
3  6        bat

In [93]: df[~df.C.str.contains("XYZ")]
   A    C
0  5  foo
1  3  bar
3  6  bat
Related Question