Entering edit mode
5.4 years ago
elisheva
▴
120
Hi,
I am trying to search for a pattern in a sequence in a way that a specific nucleotide won't be at the edges.
For example, given the following sequence:
x <- DNAString("TGCTTGCGCA")
I want to extract all the occurrences of GC where there is no T before or after.
Therefore only one occurrence will fit, since there are: TGCT, TGC and finally CGCA which indeed meets the condition.
In other words, the matching pattern is: {T}GC{T}
But I can't find any way to implement it using the Biostrings package.
I really hope you can help me figure it out.
Thanks for your help.
What is the problem with just converting the
DNAString
to acharacter
and doing your regex with that?Because I use StringSet and I want the analysis to be as fast as possible. If I will convert any single interval into character, I guess it will be much slower.
https://support.bioconductor.org/p/121676/