keep first unique values in dataframe and replace other with nan in python

You can use the duplicated and fillna methods in pandas to achieve this.

Assuming you have a dataframe called df, you can keep only the first unique values in each column using the following code:


main.py
df = df.mask(df.apply(pd.Series.duplicated, keep='first'))
59 chars
2 lines

This will replace all duplicated values (excluding the first occurrence) with NaN. The mask function is used to replace values where the condition is True with a given value (in this case, NaN).

Alternatively, if you want to replace all duplicated values (including the first occurrence) with NaN, you can use the where function instead:


main.py
df = df.where(~df.apply(pd.Series.duplicated, keep=False))
59 chars
2 lines

This will replace all duplicated values with NaN. The ~ symbol is used to negate the condition (~True is False and vice versa), so the where function will replace values where the condition is False with a given value (NaN).

similar python code snippets

iterate dataframe fast way in python

merge two series in a dataframe in python

append a dataframe with a new line in python

merge two series in a dataframe with special function for each column in python

covnert all dataframe values from int to str in python

append a dataframe with a new serie in python

update a dataframe row from another row in python

not select columns of a dataframe from a list of those columns in python

append a line at the end of a dataframe in python

add a line in a dataframe in python

related categories