remove all rows from one dataframe that match any variables in any row or column in another dataframe in r

To remove all rows from one dataframe that match any variables in any row or column in another dataframe in R, you can use the dplyr and tidyr packages. Here is a sample solution:

main.r
# Load necessary libraries
library(dplyr)
library(tidyr)

# Create sample dataframes
df1 <- data.frame(
  id = c(1, 2, 3, 4, 5),
  name = c("John", "Mary", "David", "Emily", "James")
)

df2 <- data.frame(
  id = c(2, 4),
  name = c("Mary", "Emily")
)

# Remove all rows from df1 that match any variables in any row or column in df2
df1_filtered <- df1 %>%
  anti_join(df2, by = c("id", "name"))

# Print the filtered dataframe
print(df1_filtered)
447 chars
22 lines

In this code:

  • We first load the necessary libraries, dplyr and tidyr.
  • We create two sample dataframes, df1 and df2, with columns id and name.
  • We use the anti_join function from dplyr to remove all rows from df1 that match any variables in any row or column in df2.
  • The by argument in anti_join specifies the columns to join on. In this case, we join on both id and name columns.
  • Finally, we print the filtered dataframe, df1_filtered, which contains only the rows from df1 that do not match any rows in df2.

This will output:

main.r
  id   name
1  1   John
3  3  David
5  5  James
48 chars
5 lines

related categories

gistlibby LogSnag