dplyr left join without middle names in r

To conduct a left join in dplyr without middle names, we can use the str_remove() function from the stringr package to remove the middle names in the join variables. Here is an example code:

main.r
library(dplyr)
library(stringr)

# create example data frames
df1 <- data.frame(id = c(1, 2, 3),
                  name = c("John Smith", "Jane Doe", "Bob Johnson"))
df2 <- data.frame(id = c(1, 2, 4),
                  name = c("John A. Smith", "Jane M. Doe", "Sarah Johnson"))

# remove middle names from both data frames
df1_clean <- df1 %>% mutate(name = str_remove(name, " [A-Z]\\.? "))
df2_clean <- df2 %>% mutate(name = str_remove(name, " [A-Z]\\.? "))

# conduct left join without middle names
df_join <- df1_clean %>% left_join(df2_clean, by = "id")

# view result
df_join
581 chars
19 lines

In this code, we first load the dplyr and stringr packages. We then create two example data frames, df1 and df2, which have an id column and a name column with middle names.

Next, we use mutate() and str_remove() to create new data frames, df1_clean and df2_clean, that remove the middle names from the name column using the regular expression " [A-Z]\.? ".

Finally, we conduct a left join between df1_clean and df2_clean using left_join() and the id column as the join variable. This produces the df_join data frame, which contains the id column and the name column from both data frames, without middle names.

We can view the result by calling df_join in the console.

gistlibby LogSnag