Let's say I generated a list of 25,000 SNPs across the genome of my particular organism (which is a diploid non-model with no publically available variant resources). So, given this list of SNPs (in an appropriate format, VCF?) is there a standard method available to call genotypes from input bam files for ONLY the 25,000 variants on this list? My ideal output would be a file with 25,000 genotype calls for each individual input bam file, with the actual genotype calls being along the lines of "AG" or "GG", for example.
Essentially, is there a way to do the same thing you would do for a genotyping microarray, but applying it to illumina reads?
I'm vaguely aware that samtools mpileup can do something like this, but I'm looking for a simpler method to actually call the genotypes that doesn't involve manually calculating read ratios per site (although that data could still be useful). Could you force Haplotypecaller to do something like this?
As for my actual application: I am illumina sequencing ~70 members of a mapping population with the intent of building a linkage map to validate and scaffold a reference genome assembly.
Thanks, Mike
I think that GATK Haplotypecaller using flag --dbsnp your_25000_SNPs.vcf should answer your question.