download and prepare a dataset for a cnn in python

To download a dataset for a convolutional neural network (CNN) in Python, you can use various libraries such as TensorFlow Datasets, PyTorch Datasets, or Keras Datasets. In this example, we will use TensorFlow Datasets to download the CIFAR-10 dataset.

main.py
import tensorflow_datasets as tfds

# Download the CIFAR-10 dataset
(train_ds, test_ds), ds_info = tfds.load(name='cifar10', split=['train', 'test'], 
                                         with_info=True, shuffle_files=True)

# Preprocess the data
def preprocess_data(sample):
    image = sample['image']
    label = sample['label']
    
    image = tf.image.convert_image_dtype(image, tf.float32)
    image = tf.image.resize(image, [32, 32])
    image = (image - 0.5) / 0.5  # normalize to [-1, 1]
    
    return image, label

train_ds = train_ds.map(preprocess_data)
test_ds = test_ds.map(preprocess_data)

# Cache the dataset to memory
train_ds = train_ds.cache()
test_ds = test_ds.cache()

# Shuffle the dataset
train_ds = train_ds.shuffle(ds_info.splits['train'].num_examples)
test_ds = test_ds.shuffle(ds_info.splits['test'].num_examples)

# Batch the dataset
batch_size = 32
train_ds = train_ds.batch(batch_size)
test_ds = test_ds.batch(batch_size)
960 chars
33 lines

In this code, we first download the CIFAR-10 dataset using TensorFlow Datasets. Then, we preprocess the data by resizing the images, normalizing their pixel values, and converting them to float32 data type.

We cache the dataset to memory to speed up training, shuffle it randomly, and batch it into mini-batches. We can now use this dataset to train our CNN model.

gistlibby LogSnag