I am wondering if anyone can share some insight about the -R parameter in Bowtie2. The manual states as follows, "...maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. Bowtie 2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more seed alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300."
Focusing on the bold text above, what is the difference between a seed hit and a seed that aligned? I am under the impression that a seed is only generated if it has a perfect match to the reference genome to begin with, so this confuses me.
My understanding is that the R parameter is meant to reduce the occurrence of highly repetitive seeds that may substantially seed extensions phase with false positives, wasting computing resources. However, it is a parameter that increases from 2 to 3 when the very-sensitive preset is chosen. So, in hopes to better understand the underpinnings I am hoping someone can shed some light.
Thanks!
In case anyone is interested...My understanding now is that if the collection of seeds align to a reference genome many many times, but the extension phase only results in fewer than 1 in 300 successful alignments, the read is then considered “repetitive” and then is re-seeded.
Would be great if someone could verify this. Thanks!