create a new column of mean squared error in pandas dataframe in python

Assuming you already have a pandas dataframe and you want to add a new column containing mean squared error for two particular columns in the dataframe, you can use numpy to compute the mean squared error and then add the new column to the dataframe using the following code:

main.py
import pandas as pd
import numpy as np

#creating a sample dataframe
df = pd.DataFrame({'predictions':[0.1, 0.4, 0.3, 0.8, 0.5], 'targets':[0.3, 0.5, 0.9, 0.2, 0.1]}) 

#computing mean squared error and assigning to a new column 'mse'
df['mse'] = np.square(df['predictions']-df['targets']).mean()

#printing the dataframe
print(df)
332 chars
12 lines

This will output the following dataframe:

main.py
   predictions  targets       mse
0          0.1      0.3  0.087000
1          0.4      0.5  0.087000
2          0.3      0.9  0.087000
3          0.8      0.2  0.087000
4          0.5      0.1  0.087000
204 chars
7 lines

In the code above, we imported pandas and numpy libraries. Then, we created a sample dataframe with two columns 'predictions' and 'targets'. Next, we computed the mean squared error using numpy and assigned the value to a new column 'mse' in the existing dataframe.

Note: In the code above, we calculated mean squared error for all the values in the 'predictions' and 'targets' column of the data frame and stored the same value to the 'mse' column which we added in the dataframe. For calculating MSE individually for each value, you can iterate over the dataframe as follow:

main.py
for i in range(len(df)):
    df.loc[i,'mse'] = np.square(df.loc[i,'predictions']-df.loc[i,'targets'])
102 chars
3 lines

related categories

gistlibby LogSnag