how to make a classification out of regression in python

To make a classification model out of a regression model, you need to first convert the continuous output of the regression model into a categorical output. There are various ways to do so, some of which are mentioned below:

  1. Thresholding: Choose a threshold below which the output is considered negative and above which is positive. This way you can convert the continuous output into binary classes.

  2. Binning: Divide the range of predictions into multiple bins and assign a class label to each of the bins. For example, if the prediction is between 0 and 0.33, assign it class 1, if it is between 0.33 and 0.66, assign it class 2, and so on.

After conversion you can train a classification model(such as Logistic Regression or SVM) on the categorical output using the same input features used for training the regression model.

Here's the sample code for Thresholding:

main.py
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_classification

# generate synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=0, random_state=42)

# train a Linear Regression model
model = LinearRegression()
model.fit(X, y)

# convert continuous output into binary classes based on thresholding
threshold = 0.5
y_pred = model.predict(X)
y_pred[y_pred >= threshold] = 1
y_pred[y_pred < threshold] = 0

# train a Logistic Regression model on the binary classes
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X, y_pred)
662 chars
21 lines

related categories

gistlibby LogSnag