use linear interpolation to get missing datapoints in r

In R, you can use the approx() function to perform linear interpolation. Suppose we have a simple data frame df with missing values:

main.r
df <- data.frame(x = c(1, 2, NA, 4, 5), y = c(3, 5, NA, 9, 11))
64 chars
2 lines

To use approx() function to get missing values for y column:

main.r
# create a vector for x column which the data is not missing
x <- df$x[!is.na(df$y)]

# create a vector for y column which the data is not missing
y <- df$y[!is.na(df$y)]

# create a vector for x column which we want to get the missing data
x_missing <- df$x[is.na(df$y)]

# use approx function to get the missing y values
y_missing <- approx(x, y, x_missing)$y

# fill in the missing y values
df$y[is.na(df$y)] <- y_missing

# show the final data frame
df
457 chars
18 lines

This will output the following data frame:

main.r
  x  y
1 1 3
2 2 5
3 3 7
4 4 9
5 5 11
38 chars
7 lines

Therefore, the missing data point at x = 3 is filled with interpolated value y = 7.

related categories

gistlibby LogSnag