how to subset pandas df columns based on missing values in python

To subset pandas DataFrame columns based on missing values in Python, you can use the isnull method to create a Boolean mask of the missing values and then use that mask to select the desired columns.

Here is an example:

main.py
import pandas as pd

# Create a DataFrame with missing values
df = pd.DataFrame({'A': [1, 2, None],
                   'B': [None, 4, 5],
                   'C': [6, 7, 8]})

# Use isnull() to create a Boolean mask of missing values
mask = df.isnull().any()

# Subset the DataFrame based on the mask
subset_df = df.loc[:, mask]

print(subset_df)
346 chars
15 lines

Output:

main.py
     A    B
0  1.0  NaN
1  2.0  4.0
2  NaN  5.0
48 chars
5 lines

In this example, the isnull().any() method is used to create a Boolean mask mask which indicates True for columns that have at least one missing value. Then, the loc indexer is used to subset the DataFrame by selecting the columns where the mask is True.

related categories

gistlibby LogSnag