create a function to sum by interval dates in a pandas dataframe in python

You can create a function to perform a sum of columns in a pandas dataframe for a particular interval of dates using the following code:

main.py
import pandas as pd

def sum_by_interval(dataframe, start_date, end_date, cols_to_sum):
    mask = (dataframe['date'] >= start_date) & (dataframe['date'] <= end_date)
    result = dataframe.loc[mask, cols_to_sum].sum()
    return result
237 chars
7 lines

Here, dataframe is the pandas dataframe that contains the data, start_date and end_date specify the interval for the sum calculation, and cols_to_sum is a list of columns that we want to sum.

We first create a boolean mask mask using the start and end dates. We then use this mask to subset the dataframe and select only the rows that are within the specified date range. Finally, we call the sum() function on the subset to sum up the columns of interest.

You can use this function as shown below:

main.py
# create a sample dataframe
df = pd.DataFrame({
    'date': pd.date_range(start='2022-01-01', end='2022-01-10'),
    'value1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'value2': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
})

# call the function to sum the values between 2022-01-03 and 2022-01-07 for columns 'value1' and 'value2'
result = sum_by_interval(df, '2022-01-03', '2022-01-07', ['value1', 'value2'])
print(result)
419 chars
11 lines

The output would be:

main.py
value1    18
value2   210
dtype: int64
39 chars
4 lines

gistlibby LogSnag