Question

Can MarkDuplicates of Picard be used for RNA reads?

0

Entering edit mode

2.5 years ago

Nemo • 0

Hello,

I have bam files from RNA sequence data. I am following the pipeline of gatk in Variant calling in RNA sequences. In the second step, where the MarkDuplicates command of picard should be run, I am skeptical if this is only for DNA or RNA. As I read in the MarkDuplicates (Picard) there is this sentence:

This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

After reading this I am not sure, should I use it in RNA sequence pipeline?

variants MarkDuplicates rna picard • 977 views

ADD COMMENT • link updated 2.5 years ago by LChart 4.8k • written 2.5 years ago by Nemo • 0

score 2 · Accepted Answer · 2022-08-18

Yes, you can use it in RNA-seq. The degree of usefulness will depend on the method of library preparation, however. In cases where fragmentation happens prior to amplification, MarkDuplicates can (and likely should) be used (or an equivalent positional deduplication program). In cases where fragmentation happens after amplification, then the same parent molecule can give rise to arbitrary sub-sequences -- in this case molecular identifiers (UMI) should be used for deduplication.