Question

Can someone help me understand the RSeQC Output from infer_experiment.py?

2

Entering edit mode

3.7 years ago

annalisa79 ▴ 20

Can someone help me understand the RSeQC Output from infer_experiment.py?

I have RNAseq data from library constructed by TruSeq Stranded Total RNA (NEB Microbe), from pure bacterial culture so following some suggestions found here about this topic I run the mapping against the reference genome using a subsample by HISAT2 (unstranded way )and below you find the summary:

100000 reads; of these:
  100000 (100.00%) were paired; of these:
    5111 (5.11%) aligned concordantly 0 times
    88631 (88.63%) aligned concordantly exactly 1 time
    6258 (6.26%) aligned concordantly >1 times
    ----
    5111 pairs aligned concordantly 0 times; of these:
      1127 (22.05%) aligned discordantly 1 time
    ----
    3984 pairs aligned 0 times concordantly or discordantly; of these:
      7968 mates make up the pairs; of these:
        5886 (73.87%) aligned 0 times
        1899 (23.83%) aligned exactly 1 time
        183 (2.30%) aligned >1 times
97.06% overall alignment rate

Here you see the statistic on bam file obtained:

Total records:                          224985
QC failed:                              0
Optical/PCR duplicate:                  0
Non primary hits                        24985
Unmapped reads:                         5886
mapq < mapq_cut (non-unique):           12699

mapq >= mapq_cut (unique):              181415
Read-1:                                 90671
Read-2:                                 90744
Reads map to '+':                       90693
Reads map to '-':                       90722
Non-splice reads:                       180391
Splice reads:                           1024
Reads mapped in proper pairs:           177262
Proper-paired reads map to different chrom:0

Then I run the infer experiment tool of RSEQC and I see this result:

This is PairEnd Data

Fraction of reads failed to determine: 0.6712

Fraction of reads explained by "1++,1--,2+-,2-+": 0.0732

Fraction of reads explained by "1+-,1-+,2++,2--": 0.2557

So I don't understand why I cannot see an higher fraction of reads from first strand as I expected from truseq stranded RNA library. Thanks in advance if someone could give me some suggestions?

rna-seq alignment • 1.4k views

ADD COMMENT • link updated 2.2 years ago by Xin • 0 • written 3.7 years ago by annalisa79 ▴ 20

1

Entering edit mode

Hi, did you solve the problem? I have similar results.

ADD REPLY • link 3.0 years ago by camelest ▴ 50

score 0 · Answer 1 · 2022-09-28

0

Entering edit mode

2.2 years ago

Xin • 0

Hi Annalisa and Camelest, have you guys solved this problem? The same situation occurs to me. In my work, I handle samples from humans. Any suggestions would be greatly appreciated!

ADD COMMENT • link 2.2 years ago by Xin • 0