I have chromosome number and SNP position. (about million)
How can I convert these information to SNP ID?
Grab SNPs and convert them to sorted BED. Once they are in BED format, you can convert your positions to BED and do a BEDOPS bedmap
operation to map SNP IDs that associate with positions.
For example, here is a way to download dbSNP v150 for hg19
and convert it to BED with BEDOPS vcf2bed
$ wget -qO- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/All_20170710.vcf.gz | gunzip -c - | vcf2bed --sort-tmpdir=${PWD} --max-mem=2G - > hg19.dbSNP150.bed
You'd modify this for your reference genome, if you're not working with hg19
Then convert your positions to a sorted BED file, using awk
and BEDOPS sort-bed
$ awk -vOFS="\t" '{ print "chr"$1, ($2 - 1), $2; }' positions.txt | sort-bed - > positions.bed
This assumes that the chromosome number is strictly numerical (i.e., Ensembl format, and not UCSC format). So we add a chr
prefix to this number, so that the chromosome names in the BED file positions.bed
will match the chromosome names in the BED file hg19.dbSNP150.bed
. Modify this depending on the format of chromosome names in your original positions.txt
Finally, you can map positions to SNP IDs:
$ bedmap --echo --echo-map-id --delim '\t' positions.bed hg19.dbSNP150.bed > answer.bed
The file answer.bed
will have the positions in the first three columns, and the SNP rs-ID in the fourth, last column.
Dear Alex Reynolds,
Thanks for such an efficient method! But I still have some doubts. Using this method only matches the chr:start:end
information, which results in multiple rsids being merged to the same variant, and should more accurately be combined with the ref:alt
information. Is there a way to take into account the ref:alt
information additionally?
In addition to data description, you may want to post example data for better suggestion.
$ bedtools intersect -a test.txt -b dbsnp_mini.vcf -wa -wb
example records:
$ cat test.txt
chrom from to
1 17571 17571
1 17594 17594
1 17571 17571 1 17571 rs557947346 C T . . RS=557947346;RSPOS=17571;dbSNPBuildID=142;SSR=0;SAO=0;VP=0x0500000a0005000000000100;WGT=1;VC=SNV;INT;R5;ASP
1 17594 17594 1 17594 rs377698370 C T . . RS=377698370;RSPOS=17594;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x0500000a0005000002000100;WGT=1;VC=SNV;INT;R5;ASP;OTHERKG
1 17614 17614 1 17614 rs201057270 G A . . RS=201057270;RSPOS=17614;dbSNPBuildID=137;SSR=0;SAO=0;VP=0x050000020005000002000100;WGT=1;VC=SNV;R5;ASP;OTHERKG
Sorry @Emily_Ensembl I was trying to annotate somatic copy number variation in vcf format by VEP but I got this error
Could you please help me with that?
Thank you
-------------------- EXCEPTION --------------------
ERROR: Forked process(es) died: read-through of cross-process communication detected
STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:554
STACK Bio::EnsEMBL::VEP::Runner::next_output_line /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:361
STACK Bio::EnsEMBL::VEP::Runner::run /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:202
STACK EnsEMBL::Web::RunnableDB::VEP::run /nfs/public/release/ensweb/latest/live/grch37/www_95/public-plugins/tools_hive/modules/EnsEMBL/Web/RunnableDB/VEP.pm:87
STACK (eval) /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Process.pm:140
STACK Bio::EnsEMBL::Hive::Process::life_cycle /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Process.pm:127
STACK (eval) /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Worker.pm:681
STACK Bio::EnsEMBL::Hive::Worker::run_one_batch /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Worker.pm:652
STACK Bio::EnsEMBL::Hive::Worker::run /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//modules/Bio/EnsEMBL/Hive/Worker.pm:500
STACK main::main /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//scripts/runWorker.pl:141
STACK toplevel /nfs/public/release/ensweb/latest/live/grch37/www_95/ensembl-hive//scripts/runWorker.pl:22
Date (localtime) = Thu Mar 28 15:54:32 2019
Ensembl API version = 95
