train_test_split with stritify a binary classifer in python

To train_test_split with stratify a binary classifier in Python, you can use the train_test_split function from the scikit-learn library and set the stratify parameter to the target variable.

Here is an example code snippet:

main.py
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

# load the dataset
data = load_breast_cancer()

# split the data into training and testing sets with 80% for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, stratify=data.target)

print("Training data size:", X_train.shape)
print("Testing data size:", X_test.shape)
440 chars
12 lines

In this example, we used the breast cancer dataset provided by scikit-learn. The train_test_split function splits the data into training and testing sets with a ratio of 80:20 for training and testing, respectively. The stratify parameter is set to data.target, which is the binary classification target variable. By setting stratify=data.target, we ensure that the ratio of classes in the training and testing sets is the same as the ratio of classes in the original dataset.

The output shows the size of the training and testing sets.

gistlibby LogSnag