pandas diff( in python

The diff() function in pandas library in Python calculates the difference between two consecutive data rows in a DataFrame.

Here's the general syntax of the diff() function:

main.py
df.diff(periods=1, axis=0)
27 chars
2 lines

periods specifies the order of difference. When periods = 1, it calculates the difference between the current and previous row. axis specifies the direction along which we want to calculate the difference.

Here's an example:

main.py
import pandas as pd

# create a sample dataframe
data = {'A': [5, 2, 6, 1], 'B': [10, 20, 30, 40]}
df = pd.DataFrame(data)

# calculate the difference between consecutive rows of A
df['diff_A'] = df['A'].diff(periods=1)

print(df)
231 chars
11 lines

This will output:

main.py
   A   B  diff_A
0  5  10    NaN
1  2  20   -3.0
2  6  30    4.0
3  1  40   -5.0
81 chars
6 lines

As you can see, the diff_A column is the difference between consecutive rows of the A column. The first row contains a NaN value because there is no previous value to calculate the difference.

Note that the diff() function can also be used with time series data to calculate the difference between consecutive timestamps.

related categories

gistlibby LogSnag