Hi! I'm trying to get the location (chromosome and band) of a list of Entrez Gene IDs I got using the Homo.sapiens Bioconductor package:
indx <- findOverlaps(genes(TxDb.Hsapiens.UCSC.hg19.knownGene), mycoords.gr)
Since my original data mycoords.gr) are mapped to the GRCh37/hg19 genome version, I tried using Biomart to get the locations using that version of the genome:
ensembl <-useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
my.symbols <- indx$gene_id
my.regions <- getBM(c("entrezgene","hgnc_symbol", "chromosome_name", "band"),
filters = "entrezgene",
values = my.symbols,
mart = ensembl)
I noticed, however, that some of the Entrez IDs that were on my list were not on "my.regions". When I tried using the current version of the genome, those IDs were present but others were missing...
Is there a difference in Entrez IDs between assemblies? I also tried retrieving all of the Entrez IDs in ensembl and some of them were also missing...
mapping <- getBM(attributes = c("entrezgene", "hgnc_symbol"), mart = ensembl)
I don't understand this... Is there an alternative to this method?
Thanks in advance!
Can you give some examples of IDs that were in the wrong locations or missing, please?
Yes, some of the missing ones were 100033416 (hg18) and 10002(hg19) and 100033416 in both
I am able to detect the one that you have tagged as hg19:
hgnc_symbol entrezgene
1 SNHG14 100033416
This is what I get when I run the same code you wrote ...
We have a problem...
Did you mean hg18/NCBI36 or did you mean GRCh38?
I think it is GRCh38
Someone will be along with a BioMart answer but if you can post a few entrez ID's we can see if an entrezdirect solution is feasible.