fit an ols regresion on pandas scatter in python

To fit an OLS (Ordinary Least Squares) regression on a pandas scatter plot in Python, you can use the statsmodels library. Here's an example of how to do it:

First, import the necessary libraries:

main.py
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
81 chars
4 lines

Next, create a pandas DataFrame with your data points:

main.py
data = {'x': [1, 2, 3, 4, 5],
        'y': [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
85 chars
4 lines

Then, create a scatter plot using matplotlib:

main.py
plt.scatter(df['x'], df['y'])
plt.xlabel('x')
plt.ylabel('y')
62 chars
4 lines

Now, fit the OLS regression model using statsmodels:

main.py
X = sm.add_constant(df['x'])
model = sm.OLS(df['y'], X)
results = model.fit()
78 chars
4 lines

To add the regression line to the scatter plot, you can use the following code:

main.py
plt.plot(df['x'], results.fittedvalues, color='red')
53 chars
2 lines

Finally, show the scatter plot with the regression line:

main.py
plt.show()
11 chars
2 lines

This will plot a scatter plot with the regression line fitted using OLS regression. Note that you may need to install the required libraries (pandas, statsmodels, matplotlib) if you haven't already done so.

Remember, the example above assumes a simple linear regression model with one independent variable (x). The process may differ if you have multiple independent variables or if you need to perform more complex regression models.

related categories

gistlibby LogSnag