Question

RSeQC infer_experiment.py results interpretation for cuffdiff and featureCounts

0

Entering edit mode

9.6 years ago

tonja.r ▴ 600

I have RNAseq data from encode mouse and to do some analysis with cuffdiff and featureCounts I need first to understand what library type I have to chose the right parameters for cuffdiff and featureCounts.

I ran infer_experiment.py from RSeQC and found out following configuration:

This is SingleEnd Data
Fraction of reads failed to determine: 0.0023
Fraction of reads explained by "++,--": 0.0160
Fraction of reads explained by "+-,-+": 0.9817

So, I need to specify -library-type for cuffdiff. If I interpreted it right I have fr-secondstrand. Is it correct?

For featureCounts I need to specify -s parameter. It would be -s2, is it correct?

-s <int>      Indicate if strand-specific read counting should be performed.
                  It has three possible values:  0 (unstranded), 1 (stranded) and
                  2 (reversely stranded). 0 by default.

RNA-Seq • 6.3k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.6 years ago by tonja.r ▴ 600

Ram · Answer 1 · 2016-02-15

1

Entering edit mode

9.4 years ago

iraun 6.2k

According to the results you have shown, your library type is fr-firstrand. I'd recommend you to read Tophat Library-Type : Illumina Truseq Stranded Total Rna Sample Prep Kit post in order to clarify what's is going on. If you still have questions or doubts you are welcome to ask again :).

Ah, fr-firstrand corresponds to -s 2 in featureCounts.

ADD COMMENT • link 9.4 years ago by iraun 6.2k

2

Entering edit mode

Hi, I got following results from paired end data:

This is PairEnd Data
Fraction of reads failed to determine: 0.0020
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0906
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9073

I believe, here also the featureCounts , -s option is 2

ADD REPLY • link 4.3 years ago by DareDevil ★ 4.4k

0

Entering edit mode

How did you understand from the results above that the library type is fr-firststrand?

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 9.4 years ago by tonja.r ▴ 600

0

Entering edit mode

So, I did ask, how did you understand it from the results?

ADD REPLY • link 9.0 years ago by tonja.r ▴ 600

score 1 · Answer 2 · 2017-01-13

1

Entering edit mode

8.5 years ago

Yuka Takemon ▴ 40

12 months too late but I disagree with @iraun, Fraction of reads explained by "+-,-+": 0.9817 for a single end sequencing would be unstranded (fr-unstranded). If your reads were stranded, a higher percentage would be explained by "++,--" . Documentation isn't the clearest, but you can figure out the pattern of results here : http://rseqc.sourceforge.net/#infer-experiment-py

ADD COMMENT • link 8.5 years ago by Yuka Takemon ▴ 40

2

Entering edit mode

Actually, this is not correct: the example given for single-end data on the RSeQC website may be a little confusing because it is the opposite of what you typically observe in Illumina stranded libraries (but there is a verbal description of what the strand code represent).

The counts are for reads that match the gene strand annotation or that have the opposite strand. An unstranded library will have a fraction close to 0.5 for both "++,--" and "+-,-+". dUTP Illumina stranded libraries (where the strand of the read is the opposite of the strand the gene annotation) will have "+-,-+" values close to 1.

You can visualize the alignment of reads to GAPDH in IGV to double-check (if you have single-end data, they will be colored by strand).

So, the answer from iraun is correct.

ADD REPLY • link 8.5 years ago by Charles Warden 8.3k

0

Entering edit mode

Oops you're right Charles. Thanks for correcting me.

ADD REPLY • link 8.5 years ago by Yuka Takemon ▴ 40