how to aggregate a column by month in python

To aggregate a column by month in Python using pandas, you can follow the following steps:

  1. Import the necessary libraries:
main.py
import pandas as pd
20 chars
2 lines
  1. Read the data into a pandas DataFrame:
main.py
data = pd.read_csv('data_file.csv')
36 chars
2 lines
  1. Convert the column containing dates into a datetime type:
main.py
data['date'] = pd.to_datetime(data['date'])
44 chars
2 lines
  1. Create a new column for month by extracting the month from the date column:
main.py
data['month'] = data['date'].dt.month
38 chars
2 lines
  1. Use the groupby function to aggregate the desired column by month and perform the required operation (e.g., sum, mean, count, etc.):
main.py
monthly_agg = data.groupby('month')['column_to_aggregate'].sum()
65 chars
2 lines
  1. Optionally, you can reindex the resulting aggregated data to have the months in chronological order:
main.py
monthly_agg = monthly_agg.reindex(range(1, 13))
48 chars
2 lines

Here's the complete code:

main.py
import pandas as pd

data = pd.read_csv('data_file.csv')
data['date'] = pd.to_datetime(data['date'])
data['month'] = data['date'].dt.month
monthly_agg = data.groupby('month')['column_to_aggregate'].sum()
monthly_agg = monthly_agg.reindex(range(1, 13))
252 chars
8 lines

Note: Replace 'data_file.csv' with the actual path to your data file, and 'column_to_aggregate' with the name of the column you want to aggregate.

related categories

gistlibby LogSnag