I am trying to run stringTie
on some ENCODE RNA-seq datasets but I am confused if the data is stranded or not.
For instance, in regards to this dataset: ENCSR000BYS the ENCODE web page states:
They are stranded PE76 Illumina GAIIx RNA-Seq libraries from rRNA-depleted Poly-A+ RNA > 200 nucleotides in size.
However, when I run infer_experiment.py
on the BAM files I get the following result, which to my knowledge indicates unstranded library:
infer_experiment.py -i ENCFF309XGT.sortedByCoord.bam -r gencode.v31.primary_assembly.annotation_transcripts.bed -s 500000
Output:
Loading SAM/BAM file ... Total 500000 usable reads were sampled
This is PairEnd Data
Fraction of reads failed to determine: 0.0461
Fraction of reads explained by "1++,1--,2+-,2-+": 0.5633
Fraction of reads explained by "1+-,1-+,2++,2--": 0.3906
Any help is appreciated!
Where the bed file is coming from? You could try GUESSmyLT the result may be clearer
Thanks, the
bed
file is just extracted from the GENCODEgtf
. I will giveGUESSmyLT
a try.The
Specific protocol for library ENCLB555AYX
section contains a complete wetlab protocol, indeed seems to be unstranded. I mean, ENCODE is quite old, that is not really a surprise.The issue is in that section they reference the paper that describes strand-specific sequencing. Thank you for the reply, I would assume it as unstranded.