how to transform pandas dataframe to numpy for statsmodels.ols in python

To transform a pandas DataFrame to a numpy array for use with the statsmodels.ols function, you can use the .values attribute of the DataFrame. Here's an example:

main.py
import pandas as pd
import numpy as np
import statsmodels.api as sm

# Create a pandas DataFrame
data = {
    'x1': [1, 2, 3, 4, 5],
    'x2': [6, 7, 8, 9, 10],
    'y': [11, 12, 13, 14, 15]
}
df = pd.DataFrame(data)

# Convert DataFrame to numpy array
X = df[['x1', 'x2']].values
y = df['y'].values

# Fit the model
model = sm.OLS(y, sm.add_constant(X))
results = model.fit()

# Print the results
print(results.summary())
423 chars
23 lines

In this example, we create a pandas DataFrame with three columns: x1, x2, and y. We then convert the x1 and x2 columns to a numpy array X using the .values attribute, and the y column to a numpy array y. Finally, we fit an OLS model using sm.OLS and the converted numpy arrays X and y.

related categories

gistlibby LogSnag