Entering edit mode
3.2 years ago
dyfn1947
•
0
Hello, I would like to remove repeat sequence when I make a coverage plot. Red circle is repeat sequence position. How can I re remove that repeat sequence? Thanks
you could hard or soft mask your repeat sequence in the genome, and re-perform mapping.
Or use something like
samtools view
to remove reads overlapping these regions. That is probably better because if a read really comes from these repeats they are properly "decoyed" with full genome alignment. But if you mask the regions they might falsely align somewhere else. Make a BED file with the coordinates you want not included, make the complement against the entire genome (bedtools complement) and then use the-L
option ofsamtools view
to only keep reads that overlap the complement file (which is the genome minus the regions you do not want).i was skeptical of this but here is lh3 saying so also Which Aligners Recognize Soft-Masked Repeats In Reference Sequences? :)
He says
which is I think the best you can do for the aforementioned reasons.
Thanks so much !
Thanks so much !
Are you sure you mean repeat sequences? Do you not mean duplicate sequences?
Yes I'm sure that is repeat sequence