How to correctly generate alignment Stat in RNA-Seq
2
0
Entering edit mode
4.7 years ago

Hi All!!

While aligning RNA-Seq data and quantifying the actual number of reads mapped to my transcriptome, I realised that samtools flagstat does not report the actual alignied reads stats. I used RSEM for my alignment and expression calculation. I am aware the samtools counts secondary alignments thereby increasing total number of reads that pass QC and therefore influence the alignment percentage that it throws as the output. Source here

Is there any other tool that can precisely give the actual number of reads correctly mapping in pairs to the transcriptome? Somewhat like the log.out file in STAR does.

RNA-Seq sequencing alignment rna-seq • 1.7k views
ADD COMMENT
0
Entering edit mode
4.7 years ago

Just exclude secondary alignments then: samtools view -hF 256 foo.bam | samtools flagstat -

ADD COMMENT
0
Entering edit mode

Hi Devon. Used the above command and I am getting the following stats

102767366 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
97087946 + 0 mapped (94.47% : N/A)
102767366 + 0 paired in sequencing
51383683 + 0 read1
51383683 + 0 read2
97087946 + 0 properly paired (94.47% : N/A)
97087946 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

However, from the FastQC report that I have, I know my sample has 88.5 million reads (each in R1 and R2). This 2 x 88.5 million should be equal to the number of reads I get in QC Passed reads ~177 million. How should I account for the rest of the ~74 million reads?

ADD REPLY
0
Entering edit mode

They're not in the BAM file, so likely you excluded them elsewhere.

ADD REPLY
1
Entering edit mode

Thanks @Devon

Found a way around. I revisited the RSEM source code and read it. I found that the Log.final.out file is generated in the sample_temp folder which is deleted at the end. Using --keep-intermediate-files we can preserve the alignment stats and other logs too.

ADD REPLY
0
Entering edit mode
4.7 years ago
brianj.park ▴ 60

Check RSeQC: in particular bam_stat.py

ADD COMMENT

Login before adding your answer.

Traffic: 2438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6