replace the last non na string with a string from another column across columns that start with 'to' in r

To achieve this in R, you can use the tidyverse packages, specifically dplyr and tidyr. Here's how you can do it:


main.r
library(dplyr)
library(tidyr)

# assuming df is your data frame
df %>% 
  mutate(
    across(starts_with("to"), ~ ifelse(is.na(.x), NA, ifelse(rowSums(!is.na(across(contains("to")))) == 1, other_column, .x)))
  )
213 chars
9 lines

However, the above code does not specifically target the last non-NA value. For that, you need a custom function that can be applied to each row.

Here is a more detailed example:


main.r
library(dplyr)
library(tidyr)

replace_last_non_na <- function(row, other_column) {
  to_columns <- row[starts_with("to")]
  non_na_columns <- to_columns[!is.na(to_columns)]
  
  if (length(non_na_columns) > 0) {
    last_non_na_index <- which(!is.na(to_columns))[length(which(!is.na(to_columns)))]
    row[[last_non_na_index]] <- other_column
  }
  
  return(row)
}

df %>% 
  rowwise() %>% 
  mutate(
    across(starts_with("to"), ~ replace_last_non_na(c_across(), other_column))
  ) %>% 
  ungroup()
503 chars
22 lines

This will correctly replace the last non-NA value in the columns starting with "to" with the value from other_column.

Remember to replace "other_column" with the name of the column that contains the string you want to use for replacement.

Please note that the rowwise function may not be efficient for large datasets due to its overhead. For large datasets, using a vectorized approach or data.table package may be more efficient.

Here's how you could achieve the same using the data.table package:


main.r
library(data.table)

setDT(df)

for (i in 1:nrow(df)) {
  to_columns <- df[i, .SD, .SDcols = patterns("to")]
  non_na_columns <- to_columns[!is.na(to_columns)]
  
  if (nrow(non_na_columns) > 0) {
    last_non_na_index <- dim(to_columns)[1]
    set(df, i, last_non_na_index, other_column)
  }
}
295 chars
14 lines

similar r code snippets

transpose a dataframe according to a specific column in r

calculate the mean by group with missing data in r

remove a column in a df with a special name in r

isolate most common in r

pivot longer one column in data frame in r

divide value in a tabble based on side column in r

select all the ol in a df exept the last 2 in r

remove mutltiple columns from a df in r

delete empty rows in r

divide alternating rows in table in r

related categories