I am trying to perform a SNP calling procedure based on plant whole genome assemblies. To this end, I aligned several high-quality assemblies to the reference genome sequence using minimap, and then generated a vcf using paftools:
minimap2 -cx asm5 --cs $ref $asm > $out_paf
sort -k6,6 -k8,8n $out_paf | paftools.js call -f $ref -L10000 -l1000 -s $sample - > $out_vcf
The results I get are highly inconsistent with SNP calling results obtained using a more standard procedure - mapping short reads and calling variants with bcftools. I am assuming something is wrong with my procedure.
1) Is SNP calling from whole genome assemblies recommended at all? I have seen it done in bacteria, but not so often in eukaryotes. My assumption was that since I already have assembled genomes, this should be faster and more accurate than read-mapping methods, but maybe I was wrong. 2) Are there any recommended tools / procedures / best practices for doing that in plants or eukaryotes?
Thanks!
You are right, but luckily in my case all plants were selfed, so they are almost completely homozygous.
I'll look into the pipelines you mentioned.