Entering edit mode
10.9 years ago
Timtico
▴
330
I would like to get all the substrings of a pattern match on a XStringSet-class. I now use the following code, but this ignores multiple matches and I have the feeling there is a better way to do it that uses biostrings functions.
I load a fasta file into a XStringSet-class object and then search for a specific string using the vmatchPattern
function:
genes <- readDNAStringSet(File = "filename", format = "fasta", use.names = T)
view <- vmatchPattern(pattern = "CCGGA", genes)
matches <- unlist(view, recursive = T, use.names = T)
m <- as.matrix(matches)
I retrieve a substring starting at the match and 20 positions upward:
subseq(genes[rownames(m),], start = m[rownames(m),1], width = 20)
What is a better way to do this that includes all possible matches and using Biostrings functions?