Deduplication using UMItools
1
0
Entering edit mode
4.9 years ago
Ati ▴ 50

I have some RNAseq data with a high duplication rate but the reads have UMI (Unique Molecular Identifiers). The UMI length is 5 bp. I have used umitools dedup to remove duplications. When I checked the duplication with MarkDuplicates tools (Picard) still the duplication is a bit high for some samples.

I would expect to have a low or even zero % duplication rate after using UMItools. Is there any explanation?

Could the length of UMI be the reason?

Thank you in advance!

RNA-Seq bam umitools duplication Picard • 2.6k views
ADD COMMENT
1
Entering edit mode
4.9 years ago

Picard should be completely ignored if you have UMIs, as it doesn't use UMIs and will therefore give inflated duplication rates (picard reports PCR duplicates determined using the position of read ends, whereas umitools uses that information in addition to UMI sequence). If you have used umitools dedup then the actual duplication rate is 0, regardless of what picard may report.

ADD COMMENT
0
Entering edit mode

@Devon Ryan Thank you! Even if the UMI length is short (5bp)?

ADD REPLY
0
Entering edit mode

yes

ADD REPLY
0
Entering edit mode

Thank you for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6