get two time series. split them into intersecting windows of length 10. then find the closest 5 neighbors for one time series from the other time series as fast as possible in r

Here's a possible approach using windowing, distance calculation, and k-NN algorithm. It assumes that the two time series are stored as vectors, ts1 and ts2, of equal length, and that the length of each time series is larger than 10.

main.r
# Define the time series as vectors of length n
n <- 10000
set.seed(123)
ts1 <- rnorm(n)
ts2 <- rnorm(n)

# Define the window size
w_size <- 10

# Define the number of neighbors to find
k <- 5

# Split the time series into windows
ts1_w <- embed(ts1, w_size)  # a matrix with n-w_size+1 rows and w_size columns
ts2_w <- embed(ts2, w_size)  # a matrix with n-w_size+1 rows and w_size columns

# Compute the pairwise distances between windows using Euclidean distance
library(proxy)
distances <- as.matrix(dist(x = ts1_w, y = ts2_w, method = "euclidean"))

# Find the k-NN for each window in ts1
library(FNN)
knn_result <- get.knnx(distances, k = k, sorted = TRUE, index.return = TRUE)

# The knn_result$nn.index matrix has k columns (the k-NN for each window) and n-w_size+1 rows (the windows in ts1)
# Each row has the indices of the k-NN for the corresponding window in ts1
875 chars
27 lines

This approach uses the embed() function to split the time series into sliding windows of size w_size. The dist() function from the proxy package is used to compute the pairwise Euclidean distances between all windows of the two time series. Then, the get.knnx() function from the FNN package is used to find the k-NN for each window in ts1. By default, get.knnx() uses Euclidean distance to calculate the distances between windows. The resulting knn_result$nn.index matrix has the indices of the k-NN for each window in ts1.

gistlibby LogSnag