I am trying to retrieve exon coordinates for all genes in Agilent's Clinical Research Exome via Biomart. Some genes are not found in the results. For example, this AK6 is not found. Ensembl biomart seems to mistake AK6 for TAF9. Genecards: http://www.genecards.org/cgi-bin/carddisp.pl?gene=AK6 also does the same. While in the in Entrez says differently:
Entrez Gene summary for AK6 Gene:
This gene encodes a protein that belongs to the adenylate kinase family of enzymes. The protein has a nuclear localization and contains Walker A (P-loop) and Walker B motifs and a metal-coordinating residue. The protein may be involved in regulation of Cajal body formation. In human, AK6 and TAF9 (GeneID: 6880) are two distinct genes that share 5' exons. Alternative splicing results in multiple transcript variants. (provided by RefSeq, Sep 2013)
@Emily_Ensembl: I actually believe otherwise. In Ensembl, TAF9 has two ENSG IDs: ENSG00000085231, ENSG00000273841; while on HGNC, they have one ID for each gene:
AK6: ENSG00000085231; Entrez:102157402; HGNC:49151; Link: http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=HGNC:49151
TAF9: ENSG00000273841; Entrez: 6880; HGNC:11542; Link: http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=HGNC:11542
So something happened on Ensembl side!
I think it has something to do with mapping interval/coordinates back to gene names.
What's happened is our links to HGNC come in via RefSeq and their links to RefSeq are wrong, so we've pulled in the wrong HGNCs. As I said, we're on the case.
Got it. I read too fast, sorry.
Thanks for the update @Emily_Ensembl. I currently in urgent need of a GTF files of GRCh37, and Rat Rn6 releases. Could you point out how I may get/make them without the possible problem with HGNC data, and before the release 81?
Thanks