I grabbed the first couple of columns from my whole genome VCF file. It looks like this. Are there any tools either web-based, Python, or R that I could use to get SNP identifiers (e.g. rs429358
or rs7412
) for all of my SNPs that are in a particular database? I'm very new to working with VCF files and I want to figure out my blood-type. I'm very comfortable coding in both Python and R if there are any packages available for these languages. I would like to avoid, if possible, depositing my sequences in a 3rd party that would potentially use my information for their own gains but I am not opposed to having them as references in case I can't figure out any other options.
#CHROM POS ID REF ALT QUAL
chrM 64 . C T 3070.00
chrM 73 . A G 3070.00
chrM 146 . T C 3070.00
chrM 153 . A G 3070.00
chrM 263 . A G 3070.00
chrM 310 . T C 3070.00
chrM 513 . GCA G 3070.00
chrM 663 . A G 3070.00
chrM 750 . A G 3070.00
chrM 1438 . A G 3070.00
chrM 1598 . G A 3070.00
chrM 1736 . A G 3070.00
chrM 1888 . G A 3070.00
chrM 2706 . A G 3070.00
chrM 3106 . CN C 3070.00
chrM 4248 . T C 3070.00
chrM 4769 . A G 3070.00
chrM 4824 . A G 3070.00
chrM 7028 . C T 3070.00
chrM 8027 . G A 3070.00
chrM 8794 . C T 3070.00
chrM 8860 . A G 3070.00
chrM 11719 . G A 3070.00
I first ran this: LC_ALL=C wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/snp147.txt.gz \ | gunzip -c \ | awk -v OFS="\t" '{ print $2,$3,($3+1),$5 }' \ | sort-bed - \
and then ran my non-trimmed (all the columns) vcf file
genome.vcf
:but none of the columns contain the
rs*
IDsDo you know what could be happening?
Can you show the top ten or twenty lines of your
genome.vcf
? I just want to verify that conversion will work. If your file isn't in VCF format, we could look at using other tools to convert it to BED so that you can map IDs to positions.I ended up running
vcf2bed
and then:bedmap --echo --echo-map-id --delim '\t' hg38.snp147.bed genome.bed > genome.vcf.hg38.snp147.bed
and it worked. Do you know of any way I can map the rs* IDs to their descriptive names?You could grep the rs IDs against a text file containing descriptive names.