You should add the consequence_type_tv
attribute to your query, which provides the location category of a SNP relative to a gene. The variation documentation has the full list of possible values.
This gives you the relative location of the SNP with respect to gene features and allows you to filter out SNPs you might not be interested in like upstream or downstream. Upstream and downstream are 5kb either direction of the gene.
Here's the query at Ensembl BioMart, or with R biomaRt:
library("biomaRt")
snp <- useMart("snp", dataset="hsapiens_snp")
ensemblids <- c("ENSG00000204296")
out <- getBM(attributes=c("refsnp_id","chr_name","chrom_start",
"ensembl_gene_stable_id","validated",
"consequence_type_tv"),
filters=c("ensembl_gene"), values=c(ensemblids), mart=snp)
head(out)
refsnp_id chr_name chrom_start ensembl_gene_stable_id validated
1 rs517922 6 32258836 ENSG00000204296 hapmap
2 rs3117133 6 32313653 ENSG00000204296 cluster,freq,1000Genome
3 rs6621681 6 32292217 ENSG00000204296
4 rs6621681 6 32292217 ENSG00000204296
5 rs6621682 6 32292221 ENSG00000204296
6 rs6621682 6 32292221 ENSG00000204296
consequence_type_tv
1 DOWNSTREAM
2 INTRONIC
3 INTRONIC
4 NON_SYNONYMOUS_CODING
5 INTRONIC
6 SYNONYMOUS_CODING
Edit for followup question:
The rule is that upstream is within 5kb of the transcript start, and downstream is within 5kb of the transcript end. Since genes can have multiple transcripts, you will want to look at the transcript in question to verify that a SNP is assigned within the documented distance.
For your example, rs76596671 is assigned to 7 alternative transcripts of ENSG00000204296. It is upstream of transcript ENST00000305725, which is located on the reverse strand of chromosome 6 from 32,260,758-32,338,274. rs76596671 is located at 32,339,357, so is 1083bp upstream of the transcript start since it's on the reverse strand.
just a correction: the definition of gene regions doesn't belong to Biomart, but to the dataset used; I suppose it is the latest Ensembl release.
In my case it should be that the definition would be from dbsnp, but from biomart I am getting SNPs more than 40kb from the genes that Im interested in and dbSNP doesnt associate them to the gene in GeneView. For example look at rs77507878 in the comment below, where the gene of interest is C6orf10.
Sorry, the SNP I was talking about is rs76596671 NOT rs77507878.