Question

Do I have to mark duplicates before mapping the reads

0

Entering edit mode

6.6 years ago

yueli7 ▴ 250

Hello,

I think the pipeline of the RNA-seq is trim adaptor, QC, then mapping the reads.

Do I have to mark duplicates before mapping the reads?

Thanks in advance.

RNA-Seq • 3.6k views

ADD COMMENT • link updated 6.6 years ago by munizmom ▴ 60 • written 6.6 years ago by yueli7 ▴ 250

score 2 · Accepted Answer · 2018-04-28

2

Entering edit mode

6.6 years ago

GouthamAtla 12k

I guess when you say "duplicates before mapping", you mean reads with identical sequences.

You don't need to. And in RNA-Seq, its not usual to remove duplicate reads.

ADD COMMENT • link 6.6 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks!

Here duplicates means MarkDuplicates.

ADD REPLY • link 6.6 years ago by yueli7 ▴ 250

1

Entering edit mode

MarkDuplicates is done after alignment

ADD REPLY • link 6.6 years ago by GouthamAtla 12k

score 2 · Accepted Answer · 2018-04-28

As geek_y said, usually is not necessary. This question has being addressed several times , for example in http://seqanswers.com/forums/showthread.php?t=6854 you can find a nice post with some views about it.

What I do is first check in the fastqc report for the level fo duplicated reads and check manually if there are overrepresented sequences using blast to identify them but if the levels of duplication are low then I dont remove any. In case I think that maybe removing them will be necessary I compare the output of samtools view (selecting the parameters to get the bam files with and without duplicates) and decide, but I did not have to remove them in any occasion yet ... although my experience is limited :)