subset columns in pandas in python

To subset columns in pandas, you can pass a list of column names within the indexing operator of the DataFrame. Here's an example code snippet:

main.py
import pandas as pd

# Creating a sample dataframe
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'London', 'Sydney']}
df = pd.DataFrame(data)

# Selecting specific columns
selected_columns = df[['Name', 'City']]

print(selected_columns)
294 chars
13 lines

The output of this code will be a new dataframe with only two columns - 'Name' and 'City'.

You can also select columns using numerical indexing instead of column names. Here's an example code snippet:

main.py
# Selecting columns using numerical indexing
selected_columns = df.iloc[:, [0, 2]]

print(selected_columns)
108 chars
5 lines

This will produce the same output as before but using numerical indexing instead. The colon : specifies that all rows are selected, and [0, 2] specifies the indices of the columns to select.

Finally, to remove specific columns from the dataframe, you can use the drop method with the columns parameter. Here's an example code snippet for that:

main.py
# Dropping specific columns from the dataframe
trimmed_df = df.drop(['Age'], axis=1)

print(trimmed_df)
104 chars
5 lines

This will produce a new dataframe with the 'Age' column removed. The axis=1 parameter is used to indicate that we want to remove a column and not a row.

related categories

gistlibby LogSnag