How to get million target snps info(ie. chromosome, location) in a easy way?
2
0
Entering edit mode
6.5 years ago
Huichen03 ▴ 40

Hi all,

I am trying to get SNP locations. What I have done is that downloaded the dbsnp database in the ensembl and used the command line grep -f snps.txt homo_sapiens_snp.vcf > matches.vcf to get my target SNP info. There are about two million SNPs in my snps.txt, and all snps info in dbsnp database are stored in the file "homo_sapiens_snp.vcf". The problem is that the process got to be Killed and I cannot get the snps' locations. Do you guys know how to fix this?

Thank you in advance!

SNP snp gene • 2.0k views
ADD COMMENT
0
Entering edit mode

Try the suggestions from the thread Question: how to get a subset of vcf file for specific SNPs.

Please tell us if those work or not. If they don't, also tell us how much memory do you have available.

ADD REPLY
0
Entering edit mode
6.5 years ago

One way to do this is via the command line. You could download SNP annotations via wget. For example:

$ wget -qO- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz | gunzip -c | convert2bed --input=vcf --output=bed --sort-tmpdir=${PWD} - > hg19.snp151.bed

Filter via grep for the SNP of interest. For example, to search on a single SNP ID:

$ grep -F rs554008981 hg19.snp151.bed
1       13549   13550   rs554008981     .       G       A       .       RS=554008981;RSPOS=13550;dbSNPBuildID=142;SSR=0;SAO=0;VP=0x050000000005000026000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;ASP;KGPhase3;CAF=0.9966,0.003395,.;COMMON=1;TOPMED=0.99221139143730886,0.00778064475025484,0.00000796381243628

To search on a file of IDs, e.g. a list of SNP IDs in rsIDs.txt:

$ grep -fF rsIDs.txt hg19.snp151.bed > matches.bed
ADD COMMENT
0
Entering edit mode

Hi, Thank you for your advice. When searching on a file of IDs, should the command line be $ grep -f? If I use $ grep -fF, I will have the error that grep: F: No such file or directory.

ADD REPLY
0
Entering edit mode

You need a file of IDs, stored in the file rsIDs.txt (for example).

ADD REPLY
0
Entering edit mode
6.5 years ago

Hello,

I asked a similar question some time ago. You could try one of the answers there. Especially the way I found on github of htslib and reported in the thread looks promising to me.

fin swimmer

ADD COMMENT

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6