I know for small number of queries you can use Biomart, however I have hundreds of thousands of SNPs/indels. How can I get position in the fastest way possible?
Thanks!
I know for small number of queries you can use Biomart, however I have hundreds of thousands of SNPs/indels. How can I get position in the fastest way possible?
Thanks!
One way is via the NIH Clinical Table Search Service, for two SNPs rs12345 and rs334:
curl https://clinicaltables.nlm.nih.gov/api/snps/v3/search?terms={rs12345,rs334}
That will return
[10000,["rs12345","rs1234501417","rs1234501474","rs1234502483","rs1234504365","rs1234505460","rs1234507061"],null,[["rs12345","22","25459491","G/A, G/C","CRYBB2P1"],["rs1234501417","1","82537618","C/G",""],["rs1234501474","1","76903116","A/G","ST6GALNAC5"],["rs1234502483","1","50243066","T/C","LOC105378711"],["rs1234504365","1","56084547","A/C","LOC105378741"],["rs1234505460","1","123981172","G/A",""],["rs1234507061","1","4067600","/CACTC",""]]][933,["rs334","rs33465","rs334771","rs334164","rs334217","rs334015","rs334597"],null,[["rs334","11","5227001","T/A, T/C, T/G","HBB"],["rs33465","3","42364792","G/A, G/C","LYZL4"],["rs334771","3","3129338","A/G","TRNT1 LOC107986006"],["rs334164","4","172318502","C/T","GALNTL6"],["rs334217","4","142332614","G/A","INPP4B"],["rs334015","2","178252192","G/C","OSBPL6"],["rs334597","2","178313657","T/C","OSBPL6"]]]
The fields are explained in the API documentation, but the second and third field are chromosome number and chromosome position in Gr38 https://clinicaltables.nlm.nih.gov/apidoc/snps/v3/doc.html
By default you get 7 results per query SNP, of which the first one is probably the one you're looking for. You can only ask for 500 SNPs at a time, after which you have to use the pagination feature.
Edit: if you want to use R instead, the rsnps package has a ncbi_snp_query() function that takes a vector of SNP IDs and returns, among many other things, the position in the 38 assembly https://docs.ropensci.org/rsnps/reference/ncbi_snp_query.html
Ah yeah for thousands of SNPs I'd download the entire dbSNP and parse that one. The hg38 database is here: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/
I believe this should be your file: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp141.txt.gz
The accompanying .sql file has the field descriptions
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
duplicate:
How To Get Chromosome Position Given Rs Number?
How To Get The Chromosome Physical Position From Rs Numbers?
Get Snp Position From A Python Interface.
how to find SNP positions (for non-bioinformaticians)
Query genomic position based on rsID in BiomaRt
Obtain chromosome, position, and alleles based on a list of SNP names
get build 37 positions from dbSNP rsIDs
Most of these are either outdated or only applicable to small number of SNPs.