Care Lorenzo / Dear Lorenzo,
You can do this in varying ways, one being via biomaRt in R:
require(biomaRt)
ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
getBM(attributes=c(
"refsnp_id", "chr_name", "chrom_start", "chrom_end", "chrom_strand",
"allele", "mapweight", "validated", "allele_1", "minor_allele",
"minor_allele_freq", "minor_allele_count", "clinical_significance",
"synonym_name", "ensembl_gene_stable_id"),
filters="snp_filter", values="rs6025",
mart=ensembl, uniqueRows=TRUE)
refsnp_id chr_name chrom_start chrom_end chrom_strand allele mapweight
rs6025 1 169549811 169549811 1 C/T 1
validated
1000Genomes,Cited,ESP,ExAC,Frequency,gnomAD,HapMap,Phenotype_or_Disease,TOPMed
allele_1 minor_allele minor_allele_freq minor_allele_count
C TRUE 0.00599042 30
clinical_significance
benign,pathogenic,drug response,risk factor
synonym_name ensembl_gene_stable_id
17284 ENSG00000198734
The value of strand will be +1 (plus strand) or -1 (minus strand).
------------------------------------
To look up multiple records at the same time, pass a vector of rs IDs to getBM()
as the values
parameter, for example:
snps <- c("rs1", "rs2", "rs3", ..., "rsx")
getBM(attributes=c(
"refsnp_id", "chr_name", ...),
filters="snp_filter", values=snps,
mart=ensembl, uniqueRows=TRUE)
Regarding strand orientation in dbSNP, you may read the FAQ, starting from: Strand Orientation.
Kevin
I realized by chance that the bed files for dbSNP contain the strand information.
So one way you can choose, is to first find out on which chromosome your SNPs are located and query than the corresponding
bed
file for it.fin swimmer
Aren't all dbSNP variant given on the forward (+) strand of the reference genome regardless of a gene/transcript?
No - some percentage of them are given on the reverse (-), not entirely sure why this is. For example, rs499479