check if column values have 90 percent overlap in vlaues in r

To check if the column values have 90 percent overlap, we can use the dplyr package in R. Here is an example code block showing how to accomplish this:

main.r
library(dplyr)

# create sample data frame with three columns
df <- data.frame(col1 = c(1, 2, 3, 4, 5),
                 col2 = c(1, 2, 3, 4, 5),
                 col3 = c(1, 2, 3, 4, 6))

# calculate the percentage overlap in values for each pair of columns
overlap_percentages <- df %>%
  summarise(across(everything(), ~ sum(.x == col1 | .x == col2 | .x == col3) / n() * 100))

# check if any pair of columns has less than 90% overlap
if(any(overlap_percentages < 90)) {
  print("Columns do not have 90% overlap in values")
} else {
  print("Columns have 90% overlap in values")
}
584 chars
18 lines

In this example, we have created a sample data frame with three columns. We then used dplyr to calculate the percentage overlap in values between each pair of columns. Finally, we checked if any pair of columns had less than a 90% overlap using an if statement.

gistlibby LogSnag