Hi All,
Can somebody explain to me how is differential promoter use inferred from RNAseq data? Here I'm talking about the way it is implemented by cuffdiff (I followed TUXEDO pipeline). In the documentation it says:
This tab delimited file lists, for each gene, the amount of overloading detected among its primary transcripts, i.e. how much differential promoter use exists between samples. Only genes producing two or more distinct primary transcripts (i.e. multi-promoter genes) are listed here.
What is considered a primary transcript? Is it equivalent to pre-mRNA ? If my library is poly(A) enriched it shouldn't be rich for pre-mRNAs yet I still get a result (TEST STATUS OK) for some of the genes. I understand that poly(A) enrichment is not ideal and hence there could have been some reads derived from pre-mRNAs but in that case I would have been limited by stochastic forces ? Hence making comparisons between conditions would be challenging. And finally why would differential levels of pre-mRNA indicate differential promoter usage and why only genes producing two or more distinct primary transcripts would be included in analysis while the rest would not?
Thanks
Crossposted to SeqAnswers: http://seqanswers.com/forums/showthread.php?t=61397
I asked the question in both forums.
Yes you did, and it's generally considered bad form to do so, because you split the answers between two forums.
I didn't intend anything bad. Simply thought that will be more likely to get answer with bigger exposure.
One of the mechanisms behind "different isoforms from a gene" is because of "different promoter usage". Hence genes with two are more transcripts are only reported as they would have undergone multiple promoter usage. You can read few papers on CAGE-Seq which is a protocol designed to understand multiple promoter usage. Then u can relate the concepts with cuffdiff output.
Hi Geek Is differential tss site usage meant by differential promoter use? Thanks
Figure 4 in this paper explains everything.