For some odd reason, the bioinformatics team responsible for providing me with NGS data records only heterozygous and homozygous to the opposite of the reference allele. That means that patient identification numbers are sometimes recorded with the major allele only and therefore lose the homozygous to the minor genotypes, something relatively unwanted for someone testing to see variants with increased prevalence in patients.
I am looking for a guide on how to integrate my data with the dbsnp database. Given that scraping is illegal/inefficient that could involve downloading the whole SNP database (something I haven't figured already) and finding some code or script to join my file with information for the specific rs** entries that correspond to my patients.