Get position in GRCh38 from rsid
1
0
Entering edit mode
4 months ago

I know for small number of queries you can use Biomart, however I have hundreds of thousands of SNPs/indels. How can I get position in the fastest way possible?

Thanks!

snp rsid location • 607 views
ADD COMMENT
0
Entering edit mode
4 months ago

One way is via the NIH Clinical Table Search Service, for two SNPs rs12345 and rs334:

curl https://clinicaltables.nlm.nih.gov/api/snps/v3/search?terms={rs12345,rs334}

That will return

[10000,["rs12345","rs1234501417","rs1234501474","rs1234502483","rs1234504365","rs1234505460","rs1234507061"],null,[["rs12345","22","25459491","G/A, G/C","CRYBB2P1"],["rs1234501417","1","82537618","C/G",""],["rs1234501474","1","76903116","A/G","ST6GALNAC5"],["rs1234502483","1","50243066","T/C","LOC105378711"],["rs1234504365","1","56084547","A/C","LOC105378741"],["rs1234505460","1","123981172","G/A",""],["rs1234507061","1","4067600","/CACTC",""]]][933,["rs334","rs33465","rs334771","rs334164","rs334217","rs334015","rs334597"],null,[["rs334","11","5227001","T/A, T/C, T/G","HBB"],["rs33465","3","42364792","G/A, G/C","LYZL4"],["rs334771","3","3129338","A/G","TRNT1 LOC107986006"],["rs334164","4","172318502","C/T","GALNTL6"],["rs334217","4","142332614","G/A","INPP4B"],["rs334015","2","178252192","G/C","OSBPL6"],["rs334597","2","178313657","T/C","OSBPL6"]]]

The fields are explained in the API documentation, but the second and third field are chromosome number and chromosome position in Gr38 https://clinicaltables.nlm.nih.gov/apidoc/snps/v3/doc.html

By default you get 7 results per query SNP, of which the first one is probably the one you're looking for. You can only ask for 500 SNPs at a time, after which you have to use the pagination feature.

Edit: if you want to use R instead, the rsnps package has a ncbi_snp_query() function that takes a vector of SNP IDs and returns, among many other things, the position in the 38 assembly https://docs.ropensci.org/rsnps/reference/ncbi_snp_query.html

ADD COMMENT
0
Entering edit mode

Both work well. However they are slow for hundreds of thousands of SNPs. Do you have an alternative way for such amount of SNPs? With rsnps package I get this error: "Error: parse error: premature EOF" very likely due to the large number of variants.

ADD REPLY
1
Entering edit mode

Ah yeah for thousands of SNPs I'd download the entire dbSNP and parse that one. The hg38 database is here: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/

I believe this should be your file: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp141.txt.gz

The accompanying .sql file has the field descriptions

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6