replace the last non na string with a string from another column across columns that start with 'to' in r

To achieve this in R, you can use the tidyverse packages, specifically dplyr and tidyr. Here's how you can do it:

main.r
library(dplyr)
library(tidyr)

# assuming df is your data frame
df %>% 
  mutate(
    across(starts_with("to"), ~ ifelse(is.na(.x), NA, ifelse(rowSums(!is.na(across(contains("to")))) == 1, other_column, .x)))
  )
213 chars
9 lines

However, the above code does not specifically target the last non-NA value. For that, you need a custom function that can be applied to each row.

Here is a more detailed example:

main.r
library(dplyr)
library(tidyr)

replace_last_non_na <- function(row, other_column) {
  to_columns <- row[starts_with("to")]
  non_na_columns <- to_columns[!is.na(to_columns)]
  
  if (length(non_na_columns) > 0) {
    last_non_na_index <- which(!is.na(to_columns))[length(which(!is.na(to_columns)))]
    row[[last_non_na_index]] <- other_column
  }
  
  return(row)
}

df %>% 
  rowwise() %>% 
  mutate(
    across(starts_with("to"), ~ replace_last_non_na(c_across(), other_column))
  ) %>% 
  ungroup()
503 chars
22 lines

This will correctly replace the last non-NA value in the columns starting with "to" with the value from other_column.

Remember to replace "other_column" with the name of the column that contains the string you want to use for replacement.

Please note that the rowwise function may not be efficient for large datasets due to its overhead. For large datasets, using a vectorized approach or data.table package may be more efficient.

Here's how you could achieve the same using the data.table package:

main.r
library(data.table)

setDT(df)

for (i in 1:nrow(df)) {
  to_columns <- df[i, .SD, .SDcols = patterns("to")]
  non_na_columns <- to_columns[!is.na(to_columns)]
  
  if (nrow(non_na_columns) > 0) {
    last_non_na_index <- dim(to_columns)[1]
    set(df, i, last_non_na_index, other_column)
  }
}
295 chars
14 lines

related categories

gistlibby LogSnag