pandas diff( in python

The diff() function in pandas library in Python calculates the difference between two consecutive data rows in a DataFrame.

Here's the general syntax of the diff() function:


main.py
df.diff(periods=1, axis=0)
27 chars
2 lines

periods specifies the order of difference. When periods = 1, it calculates the difference between the current and previous row. axis specifies the direction along which we want to calculate the difference.

Here's an example:


main.py
import pandas as pd

# create a sample dataframe
data = {'A': [5, 2, 6, 1], 'B': [10, 20, 30, 40]}
df = pd.DataFrame(data)

# calculate the difference between consecutive rows of A
df['diff_A'] = df['A'].diff(periods=1)

print(df)
231 chars
11 lines

This will output:


main.py
   A   B  diff_A
0  5  10    NaN
1  2  20   -3.0
2  6  30    4.0
3  1  40   -5.0
81 chars
6 lines

As you can see, the diff_A column is the difference between consecutive rows of the A column. The first row contains a NaN value because there is no previous value to calculate the difference.

Note that the diff() function can also be used with time series data to calculate the difference between consecutive timestamps.

similar python code snippets

dataframes in python

invest in stocks in python

merge tables in python

take the column data from two different excel files to construct a three dimensional array with a part of the data in the columns and the other in the rows in python

create a matrix in pandas in python

initialize dataframe in python

calculate a rolling average in python

calculate the z score by group. in python

create a csv file in python

run a machine learning model in python

related categories

pandas

python