Also specify if you only want to count the longest transcript. Otherwise adding up all possible transcript variations can lead to a number that may no longer make sense.
Yes. I suppose the transcriptome length is the the sum of all RNA coding sequence. I agree that adding up all splice variants don't make sense. Perhaps the longest transcript may not be optimal either. Instead, I would go for the length of a union of all variants from a gene.
@Prasad I have come across those slides before and it is always referenced when someone asks about transcriptome length, but I very much doubt its accuracy.
So if I understood correctly, a solution would be to take a gtf/gff of your organism of interest, merge overlapping intervals (bedtools) and sum the total covered interval length. As such you can also decide if you want to include lncRNA, expressed pseudogenes,...
I was hoping this information would be available on one of the standard databases or websites rather than me having to do this for every organism. But thanks anyhow.
Let's make sure we have our definitions straight, is the transcriptome length the sum of al RNA-coding sequences?
Also specify if you only want to count the longest transcript. Otherwise adding up all possible transcript variations can lead to a number that may no longer make sense.
for common model organism could be found here (page 27). As such there are no databases. Relative size could be found from research article.
Yes. I suppose the transcriptome length is the the sum of all RNA coding sequence. I agree that adding up all splice variants don't make sense. Perhaps the longest transcript may not be optimal either. Instead, I would go for the length of a union of all variants from a gene. @Prasad I have come across those slides before and it is always referenced when someone asks about transcriptome length, but I very much doubt its accuracy.
So if I understood correctly, a solution would be to take a gtf/gff of your organism of interest, merge overlapping intervals (bedtools) and sum the total covered interval length. As such you can also decide if you want to include lncRNA, expressed pseudogenes,...
I was hoping this information would be available on one of the standard databases or websites rather than me having to do this for every organism. But thanks anyhow.
I'm not sure, and it will depend a lot on how you define transcriptome and coding etc. This way you are sure of what you are doing...