dear friends
I have set of gene symbols. when I convert such symbols to appropriate ensembl gene ids, it gave me different gene ids for a given gene symbol instead of one gene id for a given gene symbol. why is this happen?
dear friends
I have set of gene symbols. when I convert such symbols to appropriate ensembl gene ids, it gave me different gene ids for a given gene symbol instead of one gene id for a given gene symbol. why is this happen?
The gene you're looking at, AGPAT1, is found on a haplotypic region. Haplotypes are regions of the genome which have two or more versions, which we find in full in different individuals. These may have the same genes in a different order, or even different genes. We have a help video explaining this here.
AGPAT1 is found in the haplotypic MHC region, of which there are nine possible versions of the genome, and it is found in seven of those nine. You can see all the possible Ensembl IDs for the different versions of AGPAT1 here.
In the current database, 661. Some will only have two members, others like AGPAT1 have lots. One haplotype set has 36 different versions on chromosome 19.
At the moment the current human genome, GRCh38, only has haplotypes, but GRC has already started making patches to repair misassembled or gapped genomic regions. We will bring these in and annotate them so we'll be looking at more duplicate genes, however in the case of patches, the gene on the patch is good and the gene on the primary is dodgy. This is different to haplotypes where all genes are equally valid.
Dear Emily,
I need one more explanation.I extracted first intron of a gene which fall in to haplotype region. suppose it produces seven haplotypes hence I got 7 first intron sequences. considering the sequence length, 4 out of 7 had same length.but rest of the sequences in different lengths. can I consider latter sequences in such haplotype region as different gene?
You may be receiving IDs from other species, like NCBI's BRCA1 example. Impossible to tell without more information.
Thank you for the reply.
I used HGNC gene symobls. For example I converted this gene symbol AGPAT1, to Ensembl Gene ID using online BioMart tool. As a result it gave me seven different Ensembl Gene IDs as follows.
HGNC symbol Ensembl Gene ID
AGPAT1 ENSG00000228892
AGPAT1 ENSG00000235758
AGPAT1 ENSG00000227642
AGPAT1 ENSG00000204310
AGPAT1 ENSG00000236873
AGPAT1 ENSG00000226467
AGPAT1 ENSG00000206324
It is more possible. Example, Y_RNA gene name has different ENSG's and also each located in different chromosomes (chr1,3,4,12,14,20,X).
That's the reason whenever someone starts the analysis take one transcript/gene annotaion into account example, Gencode or Ensembl. Also consider ENGSs are reference ids till the end of your analysis (to avoid redundent ids, example gene name/symbols).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you give us an example please?
Thank you for the reply. here is an example.
For example, I used "AGPAT1" gene symbol. I converted this gene symbol to ensemble gene ID using online BIoMart tool. It gave me seven different Ensembl Gene IDs as follows.
Yes friend I did it careful selection of taxon. the problem was I need to extract some intron sequence from set of genes. once I convert such gene symbols to Ensembl Gene IDs some genes end up with giving different Ensemble Gene IDs for some given gene symbols.