Question

UMI and the reason of high duplicate rate

0

Entering edit mode

4.1 years ago

Sara ▴ 260

in our RNA-seq data (UMI is used) we have generated we have very high duplicate rate (after removing UMI and duplicate we would have only 10 % of the reads). can you let me know what metrices I can collect to investigate the problem?

rna-seq • 2.4k views

ADD COMMENT • link updated 4.1 years ago by swbarnes2 14k • written 4.1 years ago by Sara ▴ 260

1

Entering edit mode

Can I ask why UMIs were used? Usually UMIs are used in situations where a large duplication rate is expected.
What do you mean by "removing UMIs". Normally one wouldn't remove UMIs, you would move then to the read name or BAM UMI tag
What tool and what command have you used for deduplication?
Are you sure you are doing the deduplication paired-end and not single end?

If this is straight forward traditional RNA-seq, then the normal reason for high UMI duplication would be too little RNA going into the library prep process.

ADD REPLY • link 4.1 years ago by i.sudbery 20k

0

Entering edit mode

after removing UMI

Is the duplication truly at UMI level? Sounds like there may be an experimental issue (over amplification?) with the samples, if true.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

yes at the UMI level we have high duplication rate.

ADD REPLY • link 4.1 years ago by Sara ▴ 260

1

Entering edit mode

Please don't add answers unless you're answering the principal question. Use Add Comment or Add Reply instead.

ADD REPLY • link 4.1 years ago by Ram 44k

1

Entering edit mode

In RNAseq you expect there to be duplication at the read level since there can be many copies of RNA present in sample. What is worrisome is duplication with UMI.

after removing UMI and duplicate we would have only 10 % of the reads

How much of the duplication is from UMI and how much from reads? Perhaps the duplication contribution is small from UMI side. Data would be fine to use then.

ADD REPLY • link 4.1 years ago by GenoMax 147k

score 2 · Answer 1 · 2020-10-09

2

Entering edit mode

4.1 years ago

swbarnes2 14k

If you have high UMI duplication, it means the lab people did too much PCR; but perhaps they had little choice, if there wasn't enough RNA at different steps. Talk to the people who prepped the sample, find out how many cycles of PCR they did, for starters.

ADD COMMENT • link 4.1 years ago by swbarnes2 14k