Hi,
I am using the Biostrings package for its really fast reverseComplement() function. Here's a basic tutorial with the functions and objects I'm using: link
I have a DNAStringSet object, and reverseComplement() can operate really quickly in a vectorized way across the entire set. However, I only want to conditionally take the RC of sequences in the set. I have a logical vector of the same length as my string set which indicates which sequences I want to flip.
For speed, is there any way to avoid using a for loop to loop over the vector and set, and just flip the sequences I want to?
An example of the data:
A DNAStringSet instance of length 10
width seq
[1] 30 ATGCACGCCAGCTGGAGTCAGAGTGAGGGG
[2] 30 ATGCACGCCAGCTGGAGTCAGAGTGAGGGG
[3] 30 GAGCTGAGTCGTGAGGATGCGAGGCAGACC
[4] 30 GAGCTGAGTCGTGAGGATGCGAGGCAGACC
[5] 30 AAAGGGAGCCCAGTGGGGATGAATGAGGGG
[6] 30 AAAGGGAGCCCAGTGGGGATGAATGAGGGG
[7] 31 AAAGCGCGGAGAGCCCCAGAGCATTTGAGGG
[8] 31 AAAGCGCGGAGAGCCCCAGAGCATTTGAGGG
[9] 30 ACCTAAGGGTCCTCGGAGCCCGGACTCAGG
[10] 30 ACCTAAGGGTCCTCGGAGCCCGGACTCAGG
With a corresponding vector:
[1] TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE
Thank you!