how to create a column, where there will be '1' for 3 rows for each '1' in reference column in python

Suppose you have a DataFrame df with the following columns:

  • reference_column: column containing '1's
  • other_columns: other columns of your DataFrame

To create a new column with '1' for 3 rows for each '1' in the reference_column, you can use the following code:

main.py
import pandas as pd

# group by reference_column and generate a new index
df_grouped = df.groupby('reference_column').apply(lambda x: pd.RangeIndex(len(x))).reset_index(level=0, drop=True)

# repeat the '1's for each generated index
new_column = df['reference_column'].loc[df_grouped].reset_index(drop=True)

# assign the new column to the DataFrame
df['new_column'] = new_column
380 chars
11 lines

Explanation of the code:

  • groupby the df DataFrame by reference_column
  • generate a new index for each group of rows
  • sort the index and drop the group labels from the index
  • repeat the '1's of the reference_column for each of the generated index
  • assign the new column to the df DataFrame

This code should generate a new column new_column that contains '1' for 3 rows for each '1' in reference_column.

related categories

gistlibby LogSnag