Entering edit mode
8.4 years ago
Tanvir Ahamed
▴
350
I want to map all ensembl gene id form hg38 in hg19. Any help will be appreciated ? Thanks !!
Example :
Loading library
library(biomaRt)
List of miRNA form hg38
grch38 <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
miRNA38 <- getBM( attributes=c("ensembl_gene_id","transcript_biotype"),
filters=c("transcript_biotype"),values=list("miRNA",TRUE), mart=grch38)
Result : Total 4555 ensembl gene id
List of miRNA form hg19
grch37 <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org",
path="/biomart/martservice",dataset="hsapiens_gene_ensembl")
miRNA37 <- getBM(attributes=c("ensembl_gene_id","transcript_biotype"),
filters=c("transcript_biotype"),values=list("miRNA",TRUE), mart=grch37)
Result: Total 3411 ensembl gene id
Extraxt hg38/GRCH38 ensembl_gene_id form hg19/GRCH37
en_id_hg38 <- miRNA38$ensembl_gene_id
miRNA38_19 <- getBM( attributes=c("ensembl_gene_id","transcript_biotype"),
filters=c("ensembl_gene_id"),values=list(en_id_hg38,TRUE), mart=grch37)
Result: Total 2802 ensembl gene id. But rest of 1753 (4555-2802) ensembl gene id (hg38) are not mapped in hg19.
Now How to map these 1753 hg38 ensembl id in hg19 ?
Unless there have been changes in the gene structure, the Ensembl gene ID should be the same across releases or assemblies e.g. the Ensembl gene ID for BRCA2 in both GRCh38 and GRCh37 is ENSG00000139618. However, minor changes on the UTR for example will imply a different ID being given. Have you got a gene (or list of genes) and do you know the changes between them in the different assemblies?
Example added to main question !!
There was a very useful post:
Converting Genome Coordinates From One Genome Version To Another (Ucsc Liftover, Ncbi Remap, Ensembl Api)
As far I understand, the OP'd like to convert IDs, not coordinates though.
That's true, my fault.
There could be two things going on here. Firstly, some of the IDs found in GRCh38 but not in GRCh37 could be simply due to the fact that the loci were not annotated in GRCh37 at all, but rather just in GRCh38. The other possibility is that the loci are in GRCh37 but got a different ENSG ID in GRCh38, in case there was some changes in the models. Perhaps you could get the latest GTF files from the Ensembl FTP sites and compare them (GRCh38 and GRCh37).
Thanks your your time and reply.
I have also tried with GTF files from Ensembl FTP for both GRCh37, GRCh38 and tried to map ENSG ID from GRCh38 in GRCh37 for miRNA (BioType). But could not figured out an active solution. :(
Post a few examples of ID's that do not map so @Denise can figure out what is going on.
Or better still, email the Ensembl helpdesk as they can find out if this is a feature or a bug, if you know what I mean.