Whole genome coverage plot
0
0
Entering edit mode
3.2 years ago
dyfn1947 • 0

Hello, I would like to remove repeat sequence when I make a coverage plot. Red circle is repeat sequence position. How can I re remove that repeat sequence? Thanksenter image description here

whole_genome repeat_sequence coverage • 1.9k views
ADD COMMENT
2
Entering edit mode

you could hard or soft mask your repeat sequence in the genome, and re-perform mapping.

ADD REPLY
1
Entering edit mode

Or use something like samtools view to remove reads overlapping these regions. That is probably better because if a read really comes from these repeats they are properly "decoyed" with full genome alignment. But if you mask the regions they might falsely align somewhere else. Make a BED file with the coordinates you want not included, make the complement against the entire genome (bedtools complement) and then use the -L option of samtools view to only keep reads that overlap the complement file (which is the genome minus the regions you do not want).

ADD REPLY
1
Entering edit mode

i was skeptical of this but here is lh3 saying so also Which Aligners Recognize Soft-Masked Repeats In Reference Sequences? :)

ADD REPLY
1
Entering edit mode

He says

No, do not align to masked genome for any purpose. Filter out the reads mapped to the masked region after whole-genome alignment.

which is I think the best you can do for the aforementioned reasons.

ADD REPLY
0
Entering edit mode

Thanks so much !

ADD REPLY
0
Entering edit mode

Thanks so much !

ADD REPLY
0
Entering edit mode

Are you sure you mean repeat sequences? Do you not mean duplicate sequences?

ADD REPLY
0
Entering edit mode

Yes I'm sure that is repeat sequence

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6