Entering edit mode
5.0 years ago
Ati
▴
50
I have paired RNA-seq data with high duplication rate. My reads contain UMI so after aligning with STAR
, I run umitools dedup
with --paired
option. I would expect that the output bam file would have an equal number of read1 and read2 (output of samtools flagstat
).
I'm a bit confused with the results as the number of read1 and read2 are equal before using umitools
but after that they are different.
Could anyone please clarify this to me?
Thank you in advance!
You should not be de-duplicating RNAseq data unless you have UMI's. It is not clear if you actually have UMI's in your reads even though you have referred to
umitools
. You don't useumitools
only after aligning withSTAR
.The question is adjusted. What do you mean? The reads need to be aligned first for the deduplication using UMItools!
So you did extract the UMI's with
umitools
before doing the alignments? As to why you have different read1 and read2 numbers that is likely because only one of the read pairs is mapping (see: A: Why number of #read1 and #read2 is different in samtools flagstat output? ).