SNP calling from plant whole genome assemblies
1
0
Entering edit mode
2.4 years ago
liorglic ★ 1.4k

I am trying to perform a SNP calling procedure based on plant whole genome assemblies. To this end, I aligned several high-quality assemblies to the reference genome sequence using minimap, and then generated a vcf using paftools:

minimap2 -cx asm5 --cs $ref $asm > $out_paf
sort -k6,6 -k8,8n $out_paf | paftools.js call -f $ref -L10000 -l1000 -s $sample - > $out_vcf

The results I get are highly inconsistent with SNP calling results obtained using a more standard procedure - mapping short reads and calling variants with bcftools. I am assuming something is wrong with my procedure.

1) Is SNP calling from whole genome assemblies recommended at all? I have seen it done in bacteria, but not so often in eukaryotes. My assumption was that since I already have assembled genomes, this should be faster and more accurate than read-mapping methods, but maybe I was wrong. 2) Are there any recommended tools / procedures / best practices for doing that in plants or eukaryotes?

Thanks!

calling snp whole assembly genome • 994 views
ADD COMMENT
0
Entering edit mode
2.4 years ago

What about heterozygotes ? They'll be squashed to one haploid base in the assemblies. Bacteria are haploid so no problems there.

I'd reuse the raw reads for accuracy as per the standard approach.

Potentially you can create a pangenome with PGGB -> ODGI etc, but that's a lot trickier.

ADD COMMENT
0
Entering edit mode

What about heterozygotes ?

You are right, but luckily in my case all plants were selfed, so they are almost completely homozygous.

Potentially you can create a pangenome with PGGB -> ODGI

I'll look into the pipelines you mentioned.

ADD REPLY

Login before adding your answer.

Traffic: 2781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6