Why are missing Gencode genes present in T2T-CHM13 genome annotation GTF file?
0
0
Entering edit mode
16 months ago
Petesview ▴ 10

Dear Bioinformaticians,

I have two naive questions regarding T2T-CHM13 which I struggle to fully understand.

For the newly released complete human haploid genome T2T-CHM13, I think a total of 99 novel protein coding genes were identified, and that these have corresponding closest GENCODE ID with GRCh38, thus these are paralogous genes of the latter. At the same time, there were also missing genes and transcripts that were found in GRCh38 but not longer present in CHM13.

However, when I look into the Refseq genome annotations on T2T-CHM13, I can still see that the missing genes are present in the gtf file, for example FKBP4P2. May I ask what is the reason behind this, or did I use the wrong gtf file?

Also, what is the difference between a missing gene and a missing transcript? I'm guessing a particular gene can encode multiple transcripts, but I find it hard to visualise how a specific transcript can be missing, while the gene is not. Thanks for your help!

T2T-CHM13 • 739 views
ADD COMMENT
0
Entering edit mode

Providers like NCBI do/add their own annotations so it is possible to see differences like this depending on source of annotations.

FKBP4P2 is marked as a pseudo-gene in the RefSeq annotation.

ADD REPLY
0
Entering edit mode

In this case, should I just disregard these missing genes although there may be expression?

ADD REPLY

Login before adding your answer.

Traffic: 2465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6