I have a very large *.bed
file with 15862212 lines from a whole genome VCF. I annotated the VCF for SNPs and now have a file with a preview below using the protocol in C: How to get SNP identifiers from VCF file? . How can I get the descriptors for these rs* IDs? My main goal is to figure out which blood-type I have from this information.
-bash-4.1$ zcat genome.vcf.hg38.snp147.bed.gz | head -n 10
chr1 10019 10020 rs775809821
chr1 10055 10056 rs768019142
chr1 10107 10108 rs62651026 .
chr1 10108 10109 rs376007522 .
chr1 10128 10129 rs796688738
chr1 10138 10139 rs368469931
chr1 10144 10145 rs144773400
chr1 10146 10147 rs779258992
chr1 10149 10150 rs371194064
chr1 10165 10166 rs796884232
what do you mean with "descriptors" ?
I guess there isn't a descriptor for each snp but to find metadata associated with snps such as: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=144773400
SNP annotations file: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp142Common.txt.gz