I was annotating my dataset with biomart with filtering by chromosomal region and was surprised by the genes I got, so I took a closer look on PRAMENP (ENSG00000197549).
According to biomart its positions are:
chromosome_name start_position end_position strand ensembl_gene_id hgnc_symbol
1 22 21991099 22043934 -1 ENSG00000197549 PRAMENP
But if I look at genome browser I get following:
GENCODE Transcript Annotation ENST00000337471.4 (PRAMENP)
Transcript Gene
Gencode id ENST00000337471.4 ENSG00000197549.5
HAVANA
manual id OTTHUMT00000320276.2 OTTHUMG00000150836.3
Position chr22:22345497-22398332 chr22:22345497-22398332
Because of those differences while using biomart I get lots of genes that are far away from my dataset (SNPs) according to genome browser. And those that are really close to them (according to genome browser) do not appear in biomart.
ensembl = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="www.ensembl.org",
path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
filterlist = list("22:21815836:22006492")
attributes.1 = c("chromosome_name","start_position", "end_position","strand", "ensembl_gene_id", "hgnc_symbol")
results.1 = getBM(attributes = attributes.1, filters = c("chromosomal_region"), values = filterlist, mart = ensembl)> unique(results.1$hgnc_symbol)
[1] "PRAMENP" "MAPK1" "" "TOP3B" "PPM1F"
But according to genome browser (coordinates: chr22:21,815,836-22,006,492) I should have got UBE2L3,YDJC, PI4KAP2 and some more but not those identified by biomart.
I guess the biomart dataset is build on hg38, and I am viewing hg19 in genome browser. Is it possible to get hsapiens_gene_ensembl
in hg19?
Switch genome browser to the older build and see if retrieved sequences are the same, that might confirm your suspicion.
I did it already, the result is the biomart dataset is build on hg38 and genome browser is on hg19, but all my data in on hg19, so I want to be consistent. Is there biomart dataset on hg19?