Hi all,
I am working with Illumina HiSeq 2000 100bp single end RNA-seq data. Some of my samples originate from unstranded libraries and some from stranded libraries. I'm trying to understand the best way to do read summarisation for these libraries using featurecounts for eventual DGE analysis. To date I have treated all datasets as unstranded for mapping (tophat) and counting (featurecounts).
However I am fearful that read counts for my unstranded libraries will be biased for genes which have antisense transcripts (since reads originating from the antisense transcript will be fused into the counts for the gene on the sense strand in positions that the two features overlap). So what is the recommended course of action here? I'm not interested in antisense transcripts so should i continue to treat everything as unstranded for the featurecounts run? I have seen some other threads here that suggest incorporating strandedness into the DGE calculation as a multi-factorial design but was hoping for a more thorough explanation of how this is the better workaround for this problem.
Thank you in advance.
+1 for your answer. By the way, I think the interaction term [batch:condition] is really needed here since antisense transcripts usually have opposite expression dynamics than their sense counterparts. Meaning that, in a condition, if a gene is overexpressed, there is a good chance that its antisense will be underexpressed. So the batch effect is expected to vary accross conditions, especially for the genes you are interested in, i.e, those who are differentially expressed accross conditions.
Hi, I do not think these statements are true "since antisense transcripts usually have opposite expression dynamics than their sense counterparts" and "in a condition, if a gene is overexpressed, there is a good chance that its antisense will be underexpressed". If you are talking about natural antisense transcripts (NATs) or non-coding antisense, it is not a general phenomenon where you always find anti-correlative expression. Because these expression concordance between sense and antisense is context dependent (tissue or cell type etc.,).
Examples:
The landscape of antisense gene expression in human cancers
A cautionary tale of sense-antisense gene pairs: independent regulation despite inverse correlation of expression
Genome-wide Identification and Characterization of Natural Antisense Transcripts
Genome-wide analysis of expression modes and DNA methylation status at sense–antisense transcript loci in mouse
Sense-Antisense lncRNA Pair Encoded by Locus 6p22.3 Determines Neuroblastoma Susceptibility
Conserved expression of natural antisense transcripts in mammals.
This is not a answer to the main question rather it is reply for the statement made in this post.
Well the situation is perhaps more complex in higher eukaryotes, but I think that in simpler systems, the anti-correlation between sense and anti-sense transcription is rather well established. There is for instance this recent paper:
Native elongating transcript sequencing reveals global anti-correlation between sense and antisense nascent transcription in fission yeast.
Hi,
My point was, there are evidence for both positive and negative correlation with good publications. So there is no general rule that sense and antisense are globally anti-correlated or positively correlated. There are many factors contributing to that (some times it is species dependent too).
Sorry one more reference, Antisense Transcription in the Mammalian Transcriptome