Question

Extract VCF from genome assemblies of multiple individuals

0

Entering edit mode

6.4 years ago

Anand Rao ▴ 640

Is it possible to extract SNP information into a VCF file, from ~ 20 genome assemblies of individuals from the same species, each about ~ 300MB in size?

While this is routinely done with NGS reads, mapping them to a reference, my question is specifically how to achieve the end goal of the VCF file, given the genome assemblies, but not their reads.

I am imagining the following sequence of steps, but for some of them I am not sure which tool is available / best suited:

1. Align all genomes using something like CACTUS multiple genome aligner - but 20 genomes of 300MMB size is almost guaranteed to make CACTUS run hang or crash...

2. Extract out and remove structural variant regions across these genomes - I'm not sure how exactly to carry this out

3. Of the remaining conserved genomic blocks, align to obtain SNP variants and their coordinates - snp-sites or a tool on those lines?

4. Convert SNP info into VCF - this should not be challenge, IMO

If there are orthogonal solutions to my problem, I welcome any and all suggested protocols. Thanks!

SNP Structural Variation VCF Genomes alignment • 1.5k views

ADD COMMENT • link 6.4 years ago by Anand Rao ▴ 640