Why did GENCODE change the main annotation file from Comprehensive to Basic
1
1
Entering edit mode
17 months ago
gernophil ▴ 90

Hey everyone, I just realized that until GENCODE v41 the so called "main annotation file for most users" was Comprehensive gene annotation for CHR. Since v42 it is Basic gene annotation for CHR. Why the change, and would it make a difference, if I only need the exon regions?

GENCODE • 1.4k views
ADD COMMENT
0
Entering edit mode
17 months ago
GenoMax 147k

Looks like the change appeared back with release 23: https://www.gencodegenes.org/human/release_23.html (i.e basic and comprehensive split).

According to GENCODE FAQ:

The transcripts tagged as "basic" form part of a subset of representative transcripts for each gene. This subset prioritises full-length protein coding transcripts over partial or non-protein coding transcripts within the same gene, and intends to highlight those transcripts that will be useful to the majority of users.

ADD COMMENT
0
Entering edit mode

Thanks for that. But release 23 still shows the comprehensive file as "main annotation file for most users".

ADD REPLY
0
Entering edit mode

Change happened when going from release 22 ( https://www.gencodegenes.org/human/release_22.html ) to 23.

ADD REPLY
0
Entering edit mode

Ah, I see what you mean. So from 22 to 23 the basic version in general was introduced. But as I see it from version 42 to 43 GENCODE decided that basic should be new default.

ADD REPLY
0
Entering edit mode

I think that is just a recommendation for most users. If not interested in haplotypes and other complexities then use the basic.

ADD REPLY
0
Entering edit mode

Hmm, what I am currently trying to is extracting the total length of all non-overlapping exons (see here) However, It makes a difference, if I use the basic annotations or the complete one. I find 62703 total genes with their corresponding exon length (these are the same then the ones I find with the GenomicFeatures Rpackage. However, 19862 of them differ if I use basic or comprehensive annotations.

ADD REPLY
0
Entering edit mode

Are you accounting for the haplotypes?

ADD REPLY
0
Entering edit mode

Not really. Actually, I am just looking for a good way to calculate TPM, since for this I need the gene length. Real gene length doesn’t make sense though since it’s mRNA. And since I don’t know which transcript isoform I am actually sequencing I think the most fitting would be take the total length of all available exons. I know this is not completely correct, but I can’t think of a better way.

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6