root mean squared error from scratch in python

Root mean squared error (RMSE) is a popular evaluation metric to evaluate regression models in the data science field. It is often used to measure the difference between predicted and actual values of a target variable. To calculate RMSE, following steps can be taken:

  1. Load the predicted and actual data
  2. Calculate the difference between the predicted and actual value.
  3. Square the difference obtained in step 2.
  4. Take the mean square error by adding up all the squares obtained in step 3 and then taking its mean
  5. Finally, to get the RMSE, take the square root of the mean squared error obtained in step 4.

Here is the Python code to implement the RMSE calculation from scratch:

main.py
import numpy as np

def rmse(y_true, y_pred):
    """
    Calculate Root Mean Squared Error between predicted and true values
    """
    diff = y_pred - y_true
    squared_diff = diff ** 2
    mean_squared_diff = squared_diff.mean()
    rmse_val = np.sqrt(mean_squared_diff)
    return rmse_val
296 chars
12 lines

In the above code, y_true is the actual value of response variable and y_pred is predicted value. You can verify the implementation by comparing your result with the in-built mean_squared_error function from sklearn.metrics library.

main.py
from sklearn.metrics import mean_squared_error

# Example data
y_true = np.array([1, 2, 3, 4, 5])
y_pred = np.array([1.2, 1.8, 2.5, 3.8, 5.1])

# Test implementation
print(rmse(y_true, y_pred))
print(np.sqrt(mean_squared_error(y_true, y_pred)))
245 chars
10 lines

Output:

main.py
0.5649207873047813
0.5649207873047813
38 chars
3 lines

Both the implemented function and the mean_squared_error function from sklearn's library result in the same output for the above example.

gistlibby LogSnag