Bam output of deduplication using UMItools

0

Entering edit mode

5.0 years ago

Ati ▴ 50

I have paired RNA-seq data with high duplication rate. My reads contain UMI so after aligning with STAR, I run umitools dedup with --paired option. I would expect that the output bam file would have an equal number of read1 and read2 (output of samtools flagstat).

I'm a bit confused with the results as the number of read1 and read2 are equal before using umitools but after that they are different. Could anyone please clarify this to me?

Thank you in advance!

flagstat samtools RNA-seq UMItools bam dedup • 1.6k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 5.0 years ago by Ati ▴ 50

0

Entering edit mode

You should not be de-duplicating RNAseq data unless you have UMI's. It is not clear if you actually have UMI's in your reads even though you have referred to umitools. You don't use umitools only after aligning with STAR.

ADD REPLY • link 5.0 years ago by GenoMax 148k

0

Entering edit mode

The question is adjusted. What do you mean? The reads need to be aligned first for the deduplication using UMItools!

ADD REPLY • link 5.0 years ago by Ati ▴ 50

0

Entering edit mode

So you did extract the UMI's with umitools before doing the alignments? As to why you have different read1 and read2 numbers that is likely because only one of the read pairs is mapping (see: A: Why number of #read1 and #read2 is different in samtools flagstat output? ).

ADD REPLY • link 5.0 years ago by GenoMax 148k

Login before adding your answer.