create a new column of mean squared error in pandas dataframe in python

Assuming you already have a pandas dataframe and you want to add a new column containing mean squared error for two particular columns in the dataframe, you can use numpy to compute the mean squared error and then add the new column to the dataframe using the following code:


main.py
import pandas as pd
import numpy as np

#creating a sample dataframe
df = pd.DataFrame({'predictions':[0.1, 0.4, 0.3, 0.8, 0.5], 'targets':[0.3, 0.5, 0.9, 0.2, 0.1]}) 

#computing mean squared error and assigning to a new column 'mse'
df['mse'] = np.square(df['predictions']-df['targets']).mean()

#printing the dataframe
print(df)
332 chars
12 lines

This will output the following dataframe:


main.py
   predictions  targets       mse
0          0.1      0.3  0.087000
1          0.4      0.5  0.087000
2          0.3      0.9  0.087000
3          0.8      0.2  0.087000
4          0.5      0.1  0.087000
204 chars
7 lines

In the code above, we imported pandas and numpy libraries. Then, we created a sample dataframe with two columns 'predictions' and 'targets'. Next, we computed the mean squared error using numpy and assigned the value to a new column 'mse' in the existing dataframe.

Note: In the code above, we calculated mean squared error for all the values in the 'predictions' and 'targets' column of the data frame and stored the same value to the 'mse' column which we added in the dataframe. For calculating MSE individually for each value, you can iterate over the dataframe as follow:


main.py
for i in range(len(df)):
    df.loc[i,'mse'] = np.square(df.loc[i,'predictions']-df.loc[i,'targets'])
102 chars
3 lines

similar python code snippets

optimize a data set in python

update a dataframe row from another row in python

divide value in table by another value in table based on column value in python

add a numpy array to dataframe in python

calculate the z score by group. in python

add colname to a dataframe in python

take the last value of row that is not nan and create a array consisting of these values pandas dataframe in python

find the data with the most deaths in python

add a line in a dataframe in python

run a machine learning model in python

related categories

pandas

numpy