search fastq for 2 oliglionuclitides in r

To search for 2 oligonucleotides in a fastq file using R, you can use the ShortRead package. Here's an example code to search for 2 oligonucleotides (oligo1 and oligo2) in a fastq file (fastq_file):

main.r
library(ShortRead)

# define the 2 oligos
oligo1 <- "ATCGGTACTGGTACCTGTTAGTC"
oligo2 <- "GATTAGCTACGTAGCCCAGGACG"

# create a DNAStringSet from the oligos
oligo_set <- DNAStringSet(c(oligo1, oligo2))

# open a connection to the fastq file
fastq_con <- file(fastq_file, "r")

# loop over each read in the fastq file
while (length(one_read <- readFastq(fastq_con, n=1)) > 0) {

  # extract the DNAString object from the read
  dna_obj <- unlist(sread(one_read))

  # search for the 2 oligos in the DNAString object
  oligo_matches <- matchPattern(oligo_set, dna_obj)

  # if both oligos are present, print the read ID and match positions
  if (length(oligo_matches[[1]]) > 0 & length(oligo_matches[[2]]) > 0) {
    cat("> ", one_read@id, "\n")
    cat("Oligo 1 matches at positions: ", oligo_matches[[1]], "\n")
    cat("Oligo 2 matches at positions: ", oligo_matches[[2]], "\n")
  }
}

# close the connection to the fastq file
close(fastq_con)
943 chars
32 lines

This code defines the 2 oligos, creates a DNAStringSet object from the oligos, opens a connection to the fastq file, loops over each read in the fastq file, extracts the DNA sequence from the read, searches for the 2 oligos in the DNA sequence using the matchPattern function, and prints out the read ID and oligo match positions if both oligos are present in the DNA sequence.

gistlibby LogSnag