Question

SNP calling from plant whole genome assemblies

0

Entering edit mode

2.4 years ago

liorglic ★ 1.4k

I am trying to perform a SNP calling procedure based on plant whole genome assemblies. To this end, I aligned several high-quality assemblies to the reference genome sequence using minimap, and then generated a vcf using paftools:

minimap2 -cx asm5 --cs $ref $asm > $out_paf
sort -k6,6 -k8,8n $out_paf | paftools.js call -f $ref -L10000 -l1000 -s $sample - > $out_vcf

The results I get are highly inconsistent with SNP calling results obtained using a more standard procedure - mapping short reads and calling variants with bcftools. I am assuming something is wrong with my procedure.

1) Is SNP calling from whole genome assemblies recommended at all? I have seen it done in bacteria, but not so often in eukaryotes. My assumption was that since I already have assembled genomes, this should be faster and more accurate than read-mapping methods, but maybe I was wrong. 2) Are there any recommended tools / procedures / best practices for doing that in plants or eukaryotes?

Thanks!

calling snp whole assembly genome • 993 views

ADD COMMENT • link 2.4 years ago by liorglic ★ 1.4k

score 0 · Answer 1 · 2022-07-19

0

Entering edit mode

2.4 years ago

colindaven 7.0k

What about heterozygotes ? They'll be squashed to one haploid base in the assemblies. Bacteria are haploid so no problems there.

I'd reuse the raw reads for accuracy as per the standard approach.

Potentially you can create a pangenome with PGGB -> ODGI etc, but that's a lot trickier.

ADD COMMENT • link 2.4 years ago by colindaven 7.0k

0

Entering edit mode

What about heterozygotes ?

You are right, but luckily in my case all plants were selfed, so they are almost completely homozygous.

Potentially you can create a pangenome with PGGB -> ODGI

I'll look into the pipelines you mentioned.

ADD REPLY • link 2.4 years ago by liorglic ★ 1.4k