Hello,
I think the pipeline of the RNA-seq is trim adaptor, QC, then mapping the reads.
Do I have to mark duplicates before mapping the reads?
Thanks in advance.
Hello,
I think the pipeline of the RNA-seq is trim adaptor, QC, then mapping the reads.
Do I have to mark duplicates before mapping the reads?
Thanks in advance.
I guess when you say "duplicates before mapping", you mean reads with identical sequences.
You don't need to. And in RNA-Seq, its not usual to remove duplicate reads.
As geek_y said, usually is not necessary. This question has being addressed several times , for example in http://seqanswers.com/forums/showthread.php?t=6854 you can find a nice post with some views about it.
What I do is first check in the fastqc report for the level fo duplicated reads and check manually if there are overrepresented sequences using blast to identify them but if the levels of duplication are low then I dont remove any. In case I think that maybe removing them will be necessary I compare the output of samtools view (selecting the parameters to get the bam files with and without duplicates) and decide, but I did not have to remove them in any occasion yet ... although my experience is limited :)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks!
Here duplicates means MarkDuplicates.
MarkDuplicates is done after alignment