I have a single end RNAseq data and would like to understand the strandedness of the data.
From wet-lab input I know "Stranded cDNA library was generated by reverse transcribing the RNA molecules".
I used infer-experiment-py
The output is:
This is SingleEnd Data
Fraction of reads failed to determine: 0.0795
Fraction of reads explained by "++,--": 0.0703
Fraction of reads explained by "+-,-+": 0.8502
So it's stranded but is it forward or reverse? I do not understand the help given here:
Does it means that its reverse stranded and I have to use s -2 option in featureCounts, "reverse" strandedness in htseq, --rf in StringTie?
Based on the output provided by infer_experiment.py, the strandedness can be inferred as follows:
Fraction of reads failed to determine: 0.0795
This indicates the fraction of reads for which the strandedness could not be determined. It could be due to various reasons such as low-quality reads or issues with the library preparation. This value is relatively small, indicating that a majority of the reads could be properly assigned a strandedness.
Fraction of reads explained by "++,--": 0.0703
This fraction represents the reads that align in the forward orientation ("++") or the reverse orientation ("--"). In other words, the reads are mapping to the same strand as their corresponding reference transcripts. In this case, it suggests that 7.03% of the reads are aligned in the forward orientation and 7.03% are aligned in the reverse orientation.
Fraction of reads explained by "+-,-+": 0.8502
This fraction represents the reads that align in the forward-reverse ("+-") or reverse-forward ("-+") orientation. In other words, the reads are mapping to the opposite strand compared to their corresponding reference transcripts. In this case, it suggests that 85.02% of the reads are aligned in the forward-reverse orientation ("-+") and 85.02% are aligned in the reverse-forward orientation ("+-").
Based on this information, we can conclude that the library appears to be stranded in a reverse manner. The majority of the reads align to the reverse-complementary strand compared to their corresponding reference transcripts, while a smaller fraction aligns to the same strand.
The majority (85%) of your reads falls in the case described by:
+-,-+
read mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
read mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
So yes, your library is reverse stranded. As featureCounts is very fast, you can quickly confirm this by running it with both settings (-s 1 and -s 2), the correct setting will have a much lower count for NoFeature than the incorrect one.