Question

Finding orientation of single-ended reads (R or F)

0

Entering edit mode

6.7 years ago

Moneeb Bajwa ▴ 10

Hello,

How would I find out whether my single-ended reads are forward or reverse? Here are the SRA reads I used: https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=434667.

Thank you!

EDIT: Nevermind...the link tells with "spot descriptor"

sequence assembly RNA-Seq next-gen • 2.9k views

ADD COMMENT • link updated 6.6 years ago by Arindam Ghosh ▴ 540 • written 6.7 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

what specifically do you want to know? if the read is from the forward or the reverse dna strand? if the read is forward or reverse in the sequencing protocol?

The former you can't obtain from the read itself, the latter should be always forward (the reverse one is the paired end mode)

ADD REPLY • link 6.7 years ago by lieven.sterck 15k

0

Entering edit mode

So would I put F (forward) for when Trinity asks for Ss_lib_type when I am aligning?

ADD REPLY • link 6.7 years ago by Moneeb Bajwa ▴ 10

1

Entering edit mode

If you want to know which setting to use with trinity:

1) map some reads with STAR (you will need some 32-34Gb memory). The index has to be built with an annotation built-in, or you can provide an annotation at run time. You don't need to output a bam file, just the gene counts:

  STAR --outFileNamePrefix temp. --genomeDir /path/to/STAR_index \
--outSAMtype None --readFilesIn file.fastq.gz --readFilesCommand zcat \
--quantMode GeneCounts

2) the gene counts output will be called temp.ReadsPerGene.out.tab, and will have four columns, one with gene names, and three with counts:

column 2:  counts for unstranded RNA-seq
column 3:  counts for the 1st read strand aligned with RNA (htseq-count option -s yes)
column 4:  counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse)

You know already your data is stranded, so inspect columns 3 and 4 to decide which one is correct. The correct one will have lots of counts, whilst the other one will have very few counts.

3) if the counts are accumulated at the 3rd column, it means you have to use --SS_lib_type F with Trinity, whereas if the counts are accumulated on the 4rth column, you have to use --SS_lib_type R.

ADD REPLY • link 6.7 years ago by h.mon 35k

0

Entering edit mode

Ok thank you. I actually found the strandness in the link I provided in the question.

ADD REPLY • link 6.7 years ago by Moneeb Bajwa ▴ 10

score 0 · Answer 1 · 2018-07-19

0

Entering edit mode

6.7 years ago

h.mon 35k

Sometimes the information is found at each sample page, for example, for the first and last samples (I didn't look at the others):

Construction protocol: Agilent SureSelect Strand Specific RNA

ADD COMMENT • link 6.7 years ago by h.mon 35k

0

Entering edit mode

But I can't find the manual for that; the only one I can find is for paired-end reads

ADD REPLY • link 6.7 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

OK, then you know it is strand-specific sequencing, but you still wouldn't know whether it is a forward or a reverse (genomic) strand, no?

ADD REPLY • link 6.7 years ago by lieven.sterck 15k

score 0 · Answer 2 · 2018-07-26

0

Entering edit mode

6.6 years ago

Arindam Ghosh ▴ 540

If you are looking for STRANDNESS then use infer_experiment.py from RseQC.

Refer http://rseqc.sourceforge.net/ & https://chipster.csc.fi/manual/library-type-summary.html

ADD COMMENT • link 6.6 years ago by Arindam Ghosh ▴ 540