update a dataframe with values from another dataframe in panda in python

To update a dataframe with values from another dataframe in pandas, you can use the merge function followed by setting values to the original dataframe using .loc.

Here is an example:

main.py
import pandas as pd

# Create example dataframes
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D'], 'value': [5, 6]})

# Merge dataframes on the key column
merged = pd.merge(df1, df2, on='key', how='left', suffixes=('', '_new'))

# Update values in df1 with values from df2
merged['value'] = merged['value_new'].where(~merged['value_new'].isna(), merged['value'])
df1.loc[merged.index, 'value'] = merged['value']
474 chars
13 lines

Explanation:

  • pd.merge is used to combine the two dataframes based on a common column, in this case the 'key' column.
  • The how='left' argument ensures that all rows from df1 are included in the merged dataframe.
  • The suffixes argument adds a suffix to any overlapping column names between the two dataframes.
  • The resulting merged dataframe is saved to merged.
  • The where method is used to replace NaN values in the 'value_new' column with the original 'value' column if no new value was found.
  • The updated 'value' column is saved to the original merged dataframe.
  • Finally, .loc is used to update the 'value' column in the original df1 dataframe with the values from merged.

gistlibby LogSnag