Hi All,
I have a question regarding transcripts that encode proteins and those that encode other products as defined by Ensembl Biotype category (i.e. retained intron CDS not defined Nonsense mediated decay etc). I have included an example of what I am referring to from useast.ensembl.org
I have used an RNAseq timecourse to identify the suite of circadian transcripts expressed above a certain background level in my cells and am trying to tell a story about what these transcript products are doing to affect the circadianly regulated metabolic properties of my cells. A circadian transcript is defined as a transcript that oscillates over the circadian day with a period of ~24h and a significant BHpvalue below a specific cutoff. When looking at the biotypes of all transcripts for a gene I've noticed that it will often have several transcripts/isoforms that encode various types of products (i.e. retained intron, CDS not defined, Nonsense mediated decay, etc...). In an attempt to limit my analysis to only those transcripts that are producing a product that will have functional relevance in my cells I have limited it to those transcripts that encode a protein. I have however noticed that there are (1) usually more than one circadian transcript that encodes a protein and those proteins are different sizes. This makes sense due to alternative splicing and those proteins potentially having different functions depending on what domains they contain/are missing; and (2) there are many circadian transcripts that are highly expressed and encode other stuff...
My questions are: HOW should I investigate the mechanism of regulation of these different circadian transcripts (i.e. differential promotor usage?) and WHY are these transcripts that are not encoding a protein being circadianly regulated? What is the point? What are they doing? Are they useless? Is it an artifact from evolution that is no longer needed? It seems hard to believe that these transcripts are doing nothing. I know that some of these may be lncRNAs with actual functions. I am more so talking about stuff that is less intuitive.
I know this question is little open ended. I would just like to hear what people thoughts are on this to inform how I should proceed.
Just to make sure that I understand it correctly - from your RNA-seq experiment, the reads that you mapped were to, in some cases, transcripts that are essentially not coding for any protein. Is this correct? I am curious to know what it might be, following would be my guesses though (as someone who is a novice in transcriptomics and comes more from the genomics end) given that there are no artefacts in the pipeline:
Was just thinking out loud :)