preliminary
Say I have some unstranded RNA-seq data and im mapping to the reference human genome using htseq-count (--stranded=no)
My understanding (biologically) was that for a given protein_coding gene, reading DNA in the sense strand gives the protein_coding transcript, reading the gene in the opposite direction gives the non-coding version of the gene.
questions
- for reads mapping to a gene whose biological function is protein_coding (irrespective of the strand of the genome the read aligns to), is a given read counted towards the protein_coding gene (irrespective of the genome strand) or considered noncoding? In other words, how does htseq-count consider read alignment directionality for unstranded RNA-seq data in assigning counts to a given gene?
- Say I am counting unstranded RNA-seq data aligned to the exon human genome only. Do reads only mapping in the sense direction of the genome count? Does the exon human genome fasta preserve directionality or does it just have genomic coordinates? So reads that align in the non-protein coding direction for an exonic portion of the genome would not be counted as protein_coding?
this doesn't answer my question -- im wondering what will a given read count towards? the protein_coding or noncoding annotation of the gene?
Reads that align to two or more features will be thrown out. But if the prep is stranded, and your features run in opposite directions, the read will count for the feature in the right direction.