how to subset rows based on condition that it is not in another dataframe in python

You can use pd.merge() to perform an anti-join operation which returns only the rows of the left dataframe that are not in the right dataframe based on a specific condition.

Here's an example of how to subset rows from df1 which are not present in df2 based on a condition:

main.py
import pandas as pd

# sample dataframes
df1 = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'A': [3, 4], 'B': ['c', 'd']})

# perform anti-join
condition = df1['A'].isin(df2['A'])
result = df1[~condition]

# display result
print(result)
273 chars
13 lines

This will output:

main.py
   A  B
0  1  a
1  2  b
24 chars
4 lines

Here, we first create two sample dataframes df1 and df2. We then define the condition as the elements in df1['A'] which are also present in df2['A']. We use the ~ operator to negate this condition, which gives us the rows in df1 that are not present in df2.

related categories

gistlibby LogSnag