I'm wondering how to determine what dbSNP build (i.e. 150 or 151) the variation annotations are based on. For example, I can find that for GRCh37, this is based on Ensembl Variation 94. The SNP attributes state that the source is dbSNP but not what build:
grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice")
listMarts(grch37)
biomart version
1 ENSEMBL_MART_ENSEMBL Ensembl Genes 94
2 ENSEMBL_MART_SNP Ensembl Variation 94
3 ENSEMBL_MART_FUNCGEN Ensembl Regulation 94
variation = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice")
listDatasets(variation)[12,]
dataset
12 hsapiens_snp
description
12 Human Short Variants (SNPs and indels excluding flagged variants) (GRCh37.p13)
version
12 GRCh37.p13
snps = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice",dataset="hsapiens_snp")
getBM(attributes=c('refsnp_id','refsnp_source',"refsnp_source_description" ,'chr_name','chrom_start','chrom_end','minor_allele','minor_allele_freq','minor_allele_count','consequence_allele_string'), filters = 'snp_filter', values ="rs123", mart = snps)
refsnp_id refsnp_source
1 rs123 dbSNP
refsnp_source_description chr_name chrom_start
1 Variants (including SNPs and indels) imported from dbSNP 7 24966446
chrom_end minor_allele minor_allele_freq minor_allele_count
1 24966446 C 0.292133 1463
consequence_allele_string
1 C/A
1) Is this information available programtically through biomaRt? 2) If not, is http://grch37.ensembl.org/info/genome/variation/species/sources_documentation.html the best place to find this information? It unfortunately looks like the documentation link on this page (http://grch37.ensembl.org/info/genome/variation/prediction/sources_phenotype_documentation.html) is currently broken.
An additional source of confusion: searching for individual SNPs on the GRCh37 ENSEMBL website (i.e. http://grch37.ensembl.org/Homo_sapiens/Variation/Explore?r=7:24965946-24966946;v=rs123;vdb=variation;vf=119) gives information from dbSNP build 150: "Original source: Variants (including SNPs and indels) imported from dbSNP (release 150)" despite "Ensembl GRCh37 release 94" stated at the bottom of the page. So perhaps we can't assume that http://grch37.ensembl.org/info/genome/variation/species/sources_documentation.html is true for all Ensembl 94/GRCh37 data?
Tagging: Emily_Ensembl