Entering edit mode
4.1 years ago
Sara
▴
260
in our RNA-seq data (UMI is used) we have generated we have very high duplicate rate (after removing UMI and duplicate we would have only 10 % of the reads). can you let me know what metrices I can collect to investigate the problem?
Can I ask why UMIs were used? Usually UMIs are used in situations where a large duplication rate is expected.
What do you mean by "removing UMIs". Normally one wouldn't remove UMIs, you would move then to the read name or BAM UMI tag
What tool and what command have you used for deduplication?
Are you sure you are doing the deduplication paired-end and not single end?
If this is straight forward traditional RNA-seq, then the normal reason for high UMI duplication would be too little RNA going into the library prep process.
Is the duplication truly at UMI level? Sounds like there may be an experimental issue (over amplification?) with the samples, if true.
yes at the UMI level we have high duplication rate.
Please don't add answers unless you're answering the principal question. Use
Add Comment
orAdd Reply
instead.In RNAseq you expect there to be duplication at the read level since there can be many copies of RNA present in sample. What is worrisome is duplication with UMI.
How much of the duplication is from UMI and how much from reads? Perhaps the duplication contribution is small from UMI side. Data would be fine to use then.