Hey everyone,
I just realized that until GENCODE v41 the so called "main annotation file for most users" was Comprehensive gene annotation
for CHR
. Since v42 it is Basic gene annotation
for CHR
. Why the change, and would it make a difference, if I only need the exon regions?
Thanks for that. But release 23 still shows the comprehensive file as "main annotation file for most users".
Change happened when going from release 22 ( https://www.gencodegenes.org/human/release_22.html ) to 23.
Ah, I see what you mean. So from 22 to 23 the
basic
version in general was introduced. But as I see it from version 42 to 43 GENCODE decided that basic should be new default.I think that is just a recommendation for most users. If not interested in haplotypes and other complexities then use the
basic
.Hmm, what I am currently trying to is extracting the total length of all non-overlapping exons (see here) However, It makes a difference, if I use the basic annotations or the complete one. I find 62703 total genes with their corresponding exon length (these are the same then the ones I find with the
GenomicFeatures
R
package. However, 19862 of them differ if I use basic or comprehensive annotations.Are you accounting for the haplotypes?
Not really. Actually, I am just looking for a good way to calculate TPM, since for this I need the gene length. Real gene length doesn’t make sense though since it’s mRNA. And since I don’t know which transcript isoform I am actually sequencing I think the most fitting would be take the total length of all available exons. I know this is not completely correct, but I can’t think of a better way.