I have run infer_experiment.py
of rseqc
package to identify the strandedness of the aligned bam file so that I can feed -s
option of featureCounts
. I used following command to generate the output:
infer_experiment.py -r hg38.bed -i xxy2.sort.bam
The output was:
This is PairEnd Data
Fraction of reads failed to determine: 0.0020
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0906
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9073
But I am confused with the result. Here, they state Pair-end non strand specific
as:
This is PairEnd Data
Fraction of reads failed to determine: 0.0172
Fraction of reads explained by "1++,1--,2+-,2-+": 0.4903
Fraction of reads explained by "1+-,1-+,2++,2--": 0.4925
and Pair-end strand specific
as:
This is PairEnd Data
Fraction of reads failed to determine: 0.0072
Fraction of reads explained by "1++,1--,2+-,2-+": 0.9441
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0487
In Pair-end non strand specific
case output is explained by both "1++,1--,2+-,2-+"
and "1+-,1-+,2++,2--"
since they have similar fractions.
In Pair-end strand specific
case output is explained by “1++,1–,2+-,2-+”
as it has the major fraction.
But my output is explained by "1+-,1-+,2++,2--"
.
Here what is the strand specificity which I can feed to --strandedness option in featureCounts
.
Any help appreciated.
I understand that this data is
strand specific
("1+-,1-+,2++,2--") as: