To create a train-test split using scikit-learn in Python, you can use the train_test_split()
function from the sklearn.model_selection
module. Here is an example of how to use it:
main.py290 chars9 lines
In the code above, X
represents the feature variables and y
represents the target variable. The test_size
argument is used to specify the size of the test set, which is typically represented as a fraction (e.g., 0.2 for 20% test set). The random_state
argument is used to ensure reproducibility of the data split, allowing you to obtain the same split every time you run the code.
After splitting the data, you will have four separate sets: X_train
(training features), X_test
(testing features), y_train
(training target), and y_test
(testing target). These sets can be used for training and evaluating machine learning models.
gistlibby LogSnag