keep first unique values of each coumn in dataframe and replace other repeated values with nan in python

You can achieve this by using pandas library. Here's an example:


main.py
import pandas as pd
import numpy as np

# create a sample dataframe
df = pd.DataFrame({'A': [1, 1, 2, 3, 3], 
                   'B': [4, 4, 5, 6, 6], 
                   'C': [7, 8, 8, 9, 10]})

# using apply function to replace repeated values with nan
df = df.apply(lambda x: x.mask(x.duplicated(), np.nan))

# output the updated dataframe
print(df)
353 chars
14 lines

The apply function is used to apply a lambda function on each column of the dataframe. The lambda function checks for duplicated values in each column using the duplicated function of pandas. If a value is duplicated, it is replaced with nan using the mask function of pandas.

This will output the following dataframe:


main.py
     A    B     C
0  1.0  4.0   7.0
1  NaN  NaN   8.0
2  2.0  5.0   NaN
3  3.0  6.0   9.0
4  NaN  NaN  10.0
108 chars
7 lines

As you can see, the first unique value in each column is kept and other repeated values are replaced with nan.

similar python code snippets

iterate dataframe fast way in python

merge two series in a dataframe in python

append a dataframe with a new line in python

merge two series in a dataframe with special function for each column in python

covnert all dataframe values from int to str in python

append a dataframe with a new serie in python

update a dataframe row from another row in python

not select columns of a dataframe from a list of those columns in python

append a line at the end of a dataframe in python

add a line in a dataframe in python

related categories