merge two dataframe by a column in python

To merge two dataframes based on a column in Python, you can use the merge() function from the pandas library.

The syntax for merging dataframes is as follows:

main.py
merged_df = pd.merge(left_df, right_df, on='column_name')
58 chars
2 lines

Here, left_df and right_df are the dataframes you want to merge and column_name is the name of the column that you want to merge on.

For example, let's assume you have two dataframes df1 and df2 with a common column called user_id.

main.py
import pandas as pd

df1 = pd.DataFrame({'user_id': [1, 2, 3, 4], 'name': ['John', 'Jane', 'Mike', 'Sara']})
df2 = pd.DataFrame({'user_id': [2, 3, 5, 6], 'age': [25, 30, 35, 40]})
180 chars
5 lines

To merge these dataframes on the user_id column, you can use the following code:

main.py
merged_df = pd.merge(df1, df2, on='user_id')
45 chars
2 lines

This will give you a merged dataframe with columns user_id, name and age. The output would look like this:

main.py
   user_id  name   age
0        2  Jane  25.0
1        3  Mike  30.0
69 chars
4 lines

Note that the resulting dataframe only includes the rows where the user_id column was present in both df1 and df2. You can specify different types of joins and merge options using additional parameters in the merge() function.

related categories

gistlibby LogSnag