Strandness of ENCODE RNA-seq data

0

Entering edit mode

4.1 years ago

husensofteng ▴ 410

I am trying to run stringTie on some ENCODE RNA-seq datasets but I am confused if the data is stranded or not.

For instance, in regards to this dataset: ENCSR000BYS the ENCODE web page states:

They are stranded PE76 Illumina GAIIx RNA-Seq libraries from rRNA-depleted Poly-A+ RNA > 200 nucleotides in size.

However, when I run infer_experiment.py on the BAM files I get the following result, which to my knowledge indicates unstranded library:

infer_experiment.py -i ENCFF309XGT.sortedByCoord.bam -r gencode.v31.primary_assembly.annotation_transcripts.bed -s 500000

Output:

Loading SAM/BAM file ... Total 500000 usable reads were sampled

This is PairEnd Data

Fraction of reads failed to determine: 0.0461

Fraction of reads explained by "1++,1--,2+-,2-+": 0.5633

Fraction of reads explained by "1+-,1-+,2++,2--": 0.3906

Any help is appreciated!

RNA-Seq ENCODE Assembly • 1.2k views

ADD COMMENT • link 4.1 years ago by husensofteng ▴ 410

0

Entering edit mode

Where the bed file is coming from? You could try GUESSmyLT the result may be clearer

ADD REPLY • link 4.1 years ago by Juke34 8.9k

0

Entering edit mode

Thanks, the bed file is just extracted from the GENCODE gtf. I will give GUESSmyLT a try.

ADD REPLY • link 4.1 years ago by husensofteng ▴ 410

0

Entering edit mode

The Specific protocol for library ENCLB555AYX section contains a complete wetlab protocol, indeed seems to be unstranded. I mean, ENCODE is quite old, that is not really a surprise.

ADD REPLY • link 4.1 years ago by ATpoint 85k

0

Entering edit mode

The issue is in that section they reference the paper that describes strand-specific sequencing. Thank you for the reply, I would assume it as unstranded.

ADD REPLY • link 4.1 years ago by husensofteng ▴ 410

Login before adding your answer.