Bam output of deduplication using UMItools
0
0
Entering edit mode
4.9 years ago
Ati ▴ 50

I have paired RNA-seq data with high duplication rate. My reads contain UMI so after aligning with STAR, I run umitools dedup with --paired option. I would expect that the output bam file would have an equal number of read1 and read2 (output of samtools flagstat).

I'm a bit confused with the results as the number of read1 and read2 are equal before using umitools but after that they are different. Could anyone please clarify this to me?

Thank you in advance!

flagstat samtools RNA-seq UMItools bam dedup • 1.5k views
ADD COMMENT
0
Entering edit mode

You should not be de-duplicating RNAseq data unless you have UMI's. It is not clear if you actually have UMI's in your reads even though you have referred to umitools. You don't use umitools only after aligning with STAR.

ADD REPLY
0
Entering edit mode

The question is adjusted. What do you mean? The reads need to be aligned first for the deduplication using UMItools!

ADD REPLY
0
Entering edit mode

So you did extract the UMI's with umitools before doing the alignments? As to why you have different read1 and read2 numbers that is likely because only one of the read pairs is mapping (see: A: Why number of #read1 and #read2 is different in samtools flagstat output? ).

ADD REPLY

Login before adding your answer.

Traffic: 2172 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6