keep first unique values in dataframe and replace other with nan in python

You can use the duplicated and fillna methods in pandas to achieve this.

Assuming you have a dataframe called df, you can keep only the first unique values in each column using the following code:

main.py
df = df.mask(df.apply(pd.Series.duplicated, keep='first'))
59 chars
2 lines

This will replace all duplicated values (excluding the first occurrence) with NaN. The mask function is used to replace values where the condition is True with a given value (in this case, NaN).

Alternatively, if you want to replace all duplicated values (including the first occurrence) with NaN, you can use the where function instead:

main.py
df = df.where(~df.apply(pd.Series.duplicated, keep=False))
59 chars
2 lines

This will replace all duplicated values with NaN. The ~ symbol is used to negate the condition (~True is False and vice versa), so the where function will replace values where the condition is False with a given value (NaN).

gistlibby LogSnag