I have some RNAseq data with a high duplication rate but the reads have UMI (Unique Molecular Identifiers).
The UMI length is 5 bp.
I have used umitools dedup
to remove duplications. When I checked the duplication with MarkDuplicates
tools (Picard
) still the duplication is a bit high for some samples.
I would expect to have a low or even zero % duplication rate after using UMItools. Is there any explanation?
Could the length of UMI be the reason?
Thank you in advance!
@Devon Ryan Thank you! Even if the UMI length is short (5bp)?
yes
Thank you for your help!