Can someone help me understand the RSeQC Output from infer_experiment.py?
1
2
Entering edit mode
3.7 years ago
annalisa79 ▴ 20

Can someone help me understand the RSeQC Output from infer_experiment.py?

I have RNAseq data from library constructed by TruSeq Stranded Total RNA (NEB Microbe), from pure bacterial culture so following some suggestions found here about this topic I run the mapping against the reference genome using a subsample by HISAT2 (unstranded way )and below you find the summary:

100000 reads; of these:
  100000 (100.00%) were paired; of these:
    5111 (5.11%) aligned concordantly 0 times
    88631 (88.63%) aligned concordantly exactly 1 time
    6258 (6.26%) aligned concordantly >1 times
    ----
    5111 pairs aligned concordantly 0 times; of these:
      1127 (22.05%) aligned discordantly 1 time
    ----
    3984 pairs aligned 0 times concordantly or discordantly; of these:
      7968 mates make up the pairs; of these:
        5886 (73.87%) aligned 0 times
        1899 (23.83%) aligned exactly 1 time
        183 (2.30%) aligned >1 times
97.06% overall alignment rate

Here you see the statistic on bam file obtained:

Total records:                          224985
QC failed:                              0
Optical/PCR duplicate:                  0
Non primary hits                        24985
Unmapped reads:                         5886
mapq < mapq_cut (non-unique):           12699

mapq >= mapq_cut (unique):              181415
Read-1:                                 90671
Read-2:                                 90744
Reads map to '+':                       90693
Reads map to '-':                       90722
Non-splice reads:                       180391
Splice reads:                           1024
Reads mapped in proper pairs:           177262
Proper-paired reads map to different chrom:0

Then I run the infer experiment tool of RSEQC and I see this result:

This is PairEnd Data

Fraction of reads failed to determine: 0.6712

Fraction of reads explained by "1++,1--,2+-,2-+": 0.0732

Fraction of reads explained by "1+-,1-+,2++,2--": 0.2557

So I don't understand why I cannot see an higher fraction of reads from first strand as I expected from truseq stranded RNA library. Thanks in advance if someone could give me some suggestions?

rna-seq alignment • 1.4k views
ADD COMMENT
1
Entering edit mode

Hi, did you solve the problem? I have similar results.

ADD REPLY
0
Entering edit mode
2.1 years ago
Xin • 0

Hi Annalisa and Camelest, have you guys solved this problem? The same situation occurs to me. In my work, I handle samples from humans. Any suggestions would be greatly appreciated!

ADD COMMENT

Login before adding your answer.

Traffic: 2548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6