remove columns in a df based on a pattern inside the columns names in r

To remove columns in a data frame based on a pattern inside the column names, we can use grep() and subset() functions in R.

main.r
# create sample data frame
df <- data.frame(id = 1:5, name = letters[1:5], age_2020 = c(20, 25, 30, 35, 40), 
                 gender_M = c(TRUE, FALSE, FALSE, TRUE, TRUE), phone_555 = c("123", "456", "789", "010", "011"))

# search for pattern in column names
cols_to_remove <- names(df)[grep("_2020|_555", names(df))]

# subset data frame without the columns that include the pattern
df_new <- subset(df, select = -cols_to_remove)

# check the new data frame
df_new
468 chars
13 lines

In this example, we want to remove columns that contain "_2020" or "_555" in their names. First, grep() function is used to find the column names that match the pattern. Then the subset() function is used to create a new data frame which excludes the unwanted columns using the - sign. Finally, we can check the new data frame with the df_new command.

gistlibby LogSnag