Question

how htseq-count counts unstranded RNA-seq data

0

Entering edit mode

3.6 years ago

wiscoyogi ▴ 40

preliminary

Say I have some unstranded RNA-seq data and im mapping to the reference human genome using htseq-count (--stranded=no)

My understanding (biologically) was that for a given protein_coding gene, reading DNA in the sense strand gives the protein_coding transcript, reading the gene in the opposite direction gives the non-coding version of the gene.

questions

for reads mapping to a gene whose biological function is protein_coding (irrespective of the strand of the genome the read aligns to), is a given read counted towards the protein_coding gene (irrespective of the genome strand) or considered noncoding? In other words, how does htseq-count consider read alignment directionality for unstranded RNA-seq data in assigning counts to a given gene?
Say I am counting unstranded RNA-seq data aligned to the exon human genome only. Do reads only mapping in the sense direction of the genome count? Does the exon human genome fasta preserve directionality or does it just have genomic coordinates? So reads that align in the non-protein coding direction for an exonic portion of the genome would not be counted as protein_coding?

htseq-count RNA-seq stranded • 1.7k views

ADD COMMENT • link updated 3.6 years ago by swbarnes2 14k • written 3.6 years ago by wiscoyogi ▴ 40

score 1 · Answer 1 · 2021-08-16

1

Entering edit mode

3.6 years ago

swbarnes2 14k

reading DNA in the sense strand gives the protein_coding transcript, reading the gene in the opposite direction gives the non-coding version of the gene.

Depends on the library prep. In some RNAseq preps, your reads will run towards the beginning of the transcript, in some preps, the reads might run towards the end. You have to find out what prep was used to analyze your data with the proper context.

I don't think HTSeq-count gives a fig whether or not a feature is designated protein coding.

When run unstranded, reads will count no matter what direction they are. That's the point of telling the software your prep is unstranded.

ADD COMMENT • link 3.6 years ago by swbarnes2 14k

0

Entering edit mode

this doesn't answer my question -- im wondering what will a given read count towards? the protein_coding or noncoding annotation of the gene?

ADD REPLY • link 3.6 years ago by wiscoyogi ▴ 40

1

Entering edit mode

Reads that align to two or more features will be thrown out. But if the prep is stranded, and your features run in opposite directions, the read will count for the feature in the right direction.

ADD REPLY • link 3.6 years ago by swbarnes2 14k