I have a list of 390 SNPs that I need to retrieve rsIDs for on the hg19 build.
My SNP file is in the following bed format (chromosome start stop).
chr1 207679307 207679307
chr1 207684192 207684192
chr1 207685786 207685786
chr1 207685965 207685965
chr1 207692049 207692049
BiomaRt has been recommended for this and I have managed to retrieve rsIDs for individual loci using this code:
snp_mart = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_snp")
test <- getBM(attributes = c('refsnp_id','allele','chrom_start','chrom_strand'), filters = c('chr_name','start','end'), values = list(7,37844263,37844263), mart = snpmart)
However, I'm not sure of the best way to quickly run all 390 SNPs through this code. Is this possible?
I suspect this may not be the best way to retrieve rsIDs, I have searched for other posts on this i.e see here but ideally I'm looking for a solution in R or linux.
Many Thanks
Your list is not in BED format. BED is 0-based, meaning that start and end coordinate cannot be identical. If, say on a genome browser you look at a single bp, e.g. chr14:14000, then the BED entry must be chr14-13999-14000. Be sure to reformat correctly in order to retrieve the correct data!
@OP: I think you don't need to use biomart at all. Format your data to bed format as ATpoint above suggested and then intersect with bedtools and latest dbSNP vcf data.. It should give you rsIDs and additional information from dbSNP.
Why don't you save all the 390 SNPs in an object (will a data frame work?), and direct
values =
to that?