Entering edit mode
2.4 years ago
Kevin Blighe
88k
Hola, just passing by to say 'hi'.
Please post bugs / suggestions as comments to this tutorial.
Automated dbSNP rsID position lookup
rsID to position
GRCh38
cat rsids.list
rs1296488112
rs1226262848
rs1225501837
rs1484860612
rs1235553513
rs1424506967
cat rsids.list | while read rsid ;
do
pos=$(curl -sX GET "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=$rsid&retmode=text&rettype=text" | sed 's/<\//\n/g' | grep -o -P '\<CHRPOS\>.{0,15}' | cut -f2 -d">") ;
echo -e "${pos}""\t""${rsid}" ;
done ;
1:13616 rs1296488112
1:13621 rs1226262848
1:13623 rs1225501837
1:13634 rs1484860612
1:13646 rs1235553513
1:13647 rs1424506967
rsID to position
GRCh37
This is the same as above, except that the tag returned that contains the GRCh37 co-ordinates is CHRPOS_PREV_ASSM
, not CHRPOS
:
cat rsids.list
rs1296488112
rs1226262848
rs1225501837
rs1484860612
rs1235553513
rs1424506967
cat rsids.list | while read rsid ;
do
pos=$(curl -sX GET "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=$rsid&retmode=text&rettype=text" | sed 's/<\//\n/g' | grep -o -P '\<CHRPOS_PREV_ASSM\>.{0,15}' | cut -f2 -d">") ;
echo -e "${pos}""\t""${rsid}" ;
done ;
1:13616 rs1296488112
1:13621 rs1226262848
1:13623 rs1225501837
1:13634 rs1484860612
1:13646 rs1235553513
1:13648 rs1424506967
Position to rsID
GRCH38
Here, we have to be wary that more than a single rsID can map to a single position.
cat grch38.bed
chr1 13616 13616
chr1 13621 13621
chr1 13623 13623
chr1 13634 13634
chr1 13646 13646
chr1 13647 13647
sed 's/^chr//g' grch38.bed | cut -f1,2 | while read chr pos ;
do
rsid=$(curl -sX GET "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=snp&term=$pos%5BPOSITION%5D+AND+$chr%5BCHR%5D" | sed 's/<\//\n/g' | grep -e "<Id>" | cut -f2 -d">" | awk '{print "rs"$0}' | tr '\n' ',' | sed 's/,$//g') ;
echo -e "${chr}"":""${pos}""\t""${rsid}" ;
done ;
1:13616 rs1296488112
1:13621 rs1226262848
1:13623 rs1225501837
1:13634 rs1639595957,rs1484860612
1:13646 rs1235553513
1:13647 rs1424506967
GRCH37
cat grch37.bed
chr1 13616 13616
chr1 13621 13621
chr1 13623 13623
chr1 13634 13634
chr1 13646 13646
chr1 13648 13648
sed 's/^chr//g' grch38.bed | cut -f1,2 | while read chr pos ;
do
rsid=$(curl -sX GET "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=snp&term=$pos%5BPOSITION_GRCH37%5D+AND+$chr%5BCHR%5D" | sed 's/<\//\n/g' | grep -e "<Id>" | cut -f2 -d">" | awk '{print "rs"$0}' | tr '\n' ',' | sed 's/,$//g') ;
echo -e "${chr}"":""${pos}""\t""${rsid}" ;
done ;
1:13616 rs1296488112
1:13621 rs1226262848
1:13623 rs1225501837
1:13634 rs1639595957,rs1484860612
1:13646 rs1235553513
1:13648 rs1424506967
.
Automated genome build lift-over
For automated lift-over in R via rtracklayer, please try this:
# chain file for hg19 to hg38
download.file('http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz', 'hg19ToHg38.over.chain.gz')
R.utils::gunzip('hg19ToHg38.over.chain.gz')
# chain file for hg38 to hg19
download.file('http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz', 'hg38ToHg19.over.chain.gz')
R.utils::gunzip('hg38ToHg19.over.chain.gz')
# import variant positions
positions <- read.table('grch37.bed', sep = '\t', header = FALSE, stringsAsFactors = FALSE)
# perform liftover, assuming hg19 to hg38
library(rtracklayer)
grObject <- GRanges(
seqnames = positions[,1],
ranges = IRanges(start = positions[,2], end = positions[,2]))
chainObject <- import.chain('hg19ToHg38.over.chain')
as.data.frame(liftOver(grObject, chainObject))[,c('seqnames','start','end','width')]
seqnames start end width
1 chr1 13616 13616 1
2 chr1 13621 13621 1
3 chr1 13623 13623 1
4 chr1 13634 13634 1
5 chr1 13646 13646 1
6 chr1 13647 13647 1
Gracias,
Kevin
And for those looking for a python solution:
When an rsID you are trying to lookup does not exist:
dbSNP Entrez does have a rate limit of 3 queries per ??? seconds, so:
To search by position, entrez does not seem to have a direct search link. So you have to do a URL based form search in their user web interface:
Hi,
Thanks for this useful script!
It works as expected most of the time, but I've notice that sometimes it comes up empty, I guess this has to do with internet speed at the moment of query.
For example, I wanted to find the HG38 postion of a list of SNPs (rs IDs), which contained rs17728338. The position for this SNP came back empty in the list, but when I looked it up manually, it did find the position. Maybe there is a way to specify that it should re-try if it comes up empty at first?
Thanks, Annique
Hey Annique, it could be that the server is becoming overloaded with requests, and blocks some of the requests. Can you add the
sleep
command to thewhile do
loops: For example:Hello,
Than you for sharing this! I was trying your rsID to position codes and it seems like the link might be broken? I was able to run through but no annotation was produced. Would greatly appreciate it if you can take a look at it. Thank you!