Hi,
I have a dataset of summary statistics however lots [millions...] of the snp names/rsid are missing and I want to add these. I want to add them using the chr / pos / a1 / a0 info that I have.
I know I need to use an existing database to do this - Much like how to query ensembl sql database - to check if a snp (name = rs...) is in an intron ? - however I do not have the snp name so this seems to hinder me? Most of the posts are designed for filling in snp info when you already have the name?
I downloaded the latest build from dbsnp https://ftp.ncbi.nih.gov/snp/latest_release/ however this is a whole database and I am finding it is too big to work with I think.
I also looked at the r package rsnps however you need the SNP names first - this has already been raised as a bug https://github.com/ropensci/rsnps/issues/122
Is there an easier way to do this? Can you use biomaRt to do it? I have seen this post Dbsnp : Best Way To Obtain Data On Snps and started running it however I do not have the SNP names
Thanks!
if you want to annotate your data with dbSNP, I would suggest to use tools such as bedtools, bcftools to annotate your data with dbSNP vcf.
Hi thankyou. However my other file [not dbsnp] is not a vcf so I can not use these tools [yet]. Do you know of a way I can convert this on the command line or otherwise? I keep seeing the use of:
mv file.txt file.vcf
or
cp file.txt file.vcf
As a conversion but this does not work for me! Thanks!
if you could post entries from text file, that would help understanding the issue.
The .txt file is very standard format:
And the .vcf is from dbsnp latest release https://ftp.ncbi.nih.gov/snp/latest_release/VCF/
And then more vcf lines until
So I would like to use columns chr / pos / ref allele / alt allele to 'fill in' or 'impute' the RSIDS in to the .txt file above. I have been told dbsnp is the best resource for this but really struggling to know how to combine these data.
Thanks!