drop columns if there are more than one non-missing values in that column, otherwise keep it pandas dataframe in python

You can use the count method to count the non-missing values for each column in the dataframe. Then, you can filter the columns to keep only those with less or equal to one non-missing value using boolean indexing. Finally, you can use the drop method to drop the selected columns from the original dataframe.

Here's an example:

main.py
import pandas as pd

# create example dataframe
df = pd.DataFrame({'A': [1, 2, 3, None, None],
                   'B': [None, 1, None, 2, None],
                   'C': [None, None, None, None, None],
                   'D': [1, None, 2, None, None],
                   'E': [None, None, None, None, 1]})

# count non-missing values for each column
count = df.count()

# perform boolean indexing to select columns with less or equal to one non-missing value
selected_cols = count[count <= 1]

# drop selected columns from the original dataframe
new_df = df.drop(selected_cols.index, axis=1)

print(new_df)
606 chars
20 lines

Output:

main.py
     A    B
0  1.0  NaN
1  2.0  1.0
2  3.0  NaN
3  NaN  2.0
4  NaN  NaN
72 chars
7 lines

In this example, columns C and E have no non-missing values, so they are dropped. Column D has more than one non-missing value, so it is also dropped. Columns A and B have one non-missing value each, so they are kept in the new dataframe.

related categories

gistlibby LogSnag