Is it possible to extract SNP information into a VCF file, from ~ 20 genome assemblies of individuals from the same species, each about ~ 300MB in size?
While this is routinely done with NGS reads, mapping them to a reference, my question is specifically how to achieve the end goal of the VCF file, given the genome assemblies, but not their reads.
I am imagining the following sequence of steps, but for some of them I am not sure which tool is available / best suited:
1. Align all genomes using something like CACTUS multiple genome aligner - but 20 genomes of 300MMB size is almost guaranteed to make CACTUS run hang or crash...
2. Extract out and remove structural variant regions across these genomes - I'm not sure how exactly to carry this out
3. Of the remaining conserved genomic blocks, align to obtain SNP variants and their coordinates - snp-sites or a tool on those lines?
4. Convert SNP info into VCF - this should not be challenge, IMO
If there are orthogonal solutions to my problem, I welcome any and all suggested protocols. Thanks!