generate synthetic dataset in python

There are multiple ways to generate synthetic datasets in Python, depending on the context and type of data needed. Below are a few approaches:

Using scikit-learn's make_classification or make_regression functions to generate classification or regression datasets, respectively:


main.py
from sklearn.datasets import make_classification, make_regression

# Generate classification dataset with 1000 samples, 4 features, and 3 classes
X, y = make_classification(n_samples=1000, n_features=4, n_classes=3)

# Generate regression dataset with 1000 samples and 3 features
X, y = make_regression(n_samples=1000, n_features=3)
333 chars
8 lines

Using numpy's random module to generate random arrays with desired shape and distribution:


main.py
import numpy as np

# Generate random array of shape (1000, 5) with uniform distribution
X = np.random.rand(1000, 5)

# Generate random array of shape (1000, 2) with normal distribution
X = np.random.normal(size=(1000,2))
222 chars
8 lines

Using third-party libraries such as Faker or Pandas to generate synthetic data with specific format:


main.py
from faker import Faker
import pandas as pd

fake = Faker()

# Generate DataFrame with 1000 rows and fake name, address, and job title columns
df = pd.DataFrame([fake.name(), fake.address(), fake.job()] for _ in range(1000))
df.columns = ['Name', 'Address', 'Job']
265 chars
9 lines

similar python code snippets

open a file in python

connect to secrets manager in python

throw and catch errors in python

send an email in python

how to create a class in python

find urls in a string in python

loop in python

loop from 1 to 10 in python

how to create a flask app in python

sort a list of dictionaries in python

related categories

artificial-intelligence