I know for small number of queries you can use Biomart, however I have hundreds of thousands of SNPs/indels. How can I get position in the fastest way possible?
I know for small number of queries you can use Biomart, however I have hundreds of thousands of SNPs/indels. How can I get position in the fastest way possible?
One way is via the NIH Clinical Table Search Service, for two SNPs rs12345 and rs334:
curl https://clinicaltables.nlm.nih.gov/api/snps/v3/search?terms={rs12345,rs334}
That will return
[10000,["rs12345","rs1234501417","rs1234501474","rs1234502483","rs1234504365","rs1234505460","rs1234507061"],null,[["rs12345","22","25459491","G/A, G/C","CRYBB2P1"],["rs1234501417","1","82537618","C/G",""],["rs1234501474","1","76903116","A/G","ST6GALNAC5"],["rs1234502483","1","50243066","T/C","LOC105378711"],["rs1234504365","1","56084547","A/C","LOC105378741"],["rs1234505460","1","123981172","G/A",""],["rs1234507061","1","4067600","/CACTC",""]]][933,["rs334","rs33465","rs334771","rs334164","rs334217","rs334015","rs334597"],null,[["rs334","11","5227001","T/A, T/C, T/G","HBB"],["rs33465","3","42364792","G/A, G/C","LYZL4"],["rs334771","3","3129338","A/G","TRNT1 LOC107986006"],["rs334164","4","172318502","C/T","GALNTL6"],["rs334217","4","142332614","G/A","INPP4B"],["rs334015","2","178252192","G/C","OSBPL6"],["rs334597","2","178313657","T/C","OSBPL6"]]]
The fields are explained in the API documentation, but the second and third field are chromosome number and chromosome position in Gr38 https://clinicaltables.nlm.nih.gov/apidoc/snps/v3/doc.html
By default you get 7 results per query SNP, of which the first one is probably the one you're looking for. You can only ask for 500 SNPs at a time, after which you have to use the pagination feature.
Edit: if you want to use R instead, the rsnps package has a ncbi_snp_query() function that takes a vector of SNP IDs and returns, among many other things, the position in the 38 assembly https://docs.ropensci.org/rsnps/reference/ncbi_snp_query.html
Ah yeah for thousands of SNPs I'd download the entire dbSNP and parse that one. The hg38 database is here: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/
I believe this should be your file: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp141.txt.gz
The accompanying .sql file has the field descriptions
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How To Get Chromosome Position Given Rs Number?
How To Get The Chromosome Physical Position From Rs Numbers?
Get Snp Position From A Python Interface.
how to find SNP positions (for non-bioinformaticians)
Query genomic position based on rsID in BiomaRt
Obtain chromosome, position, and alleles based on a list of SNP names
get build 37 positions from dbSNP rsIDs
Most of these are either outdated or only applicable to small number of SNPs.