Question

PHG - ImputePipelinePlugin fails when trying to imputing SNPs on a gvcf file.

0

Entering edit mode

12 months ago

sjp6181 • 0

Hello everyone, I hope you're doing great.

I'm trying to impute a gvcf using a PHG database. As far as I'm concerned and because of the logs (attached here) of the steps 1 and 2 in the PHG Wiki guide, It seems that I have stablished and populated the PHG db with haplotypes correctly (there is not a single 'ERROR' message in any log) . The problem comes when I run the Imputation part on a example gvcf, where I get the next error on the net.maizegenetics.pangenome.hapCalling.SNPToReadMappingPlugin - Processing record: 100_Ma100,wgsFlowcell,Ma100_.vcf.gz,wgs step:

ERROR net.maizegenetics.plugindef.AbstractPlugin - currentIndexLine must not be null

The command that I used was:

singularity exec -B ${WORKING_DIR}/:/phg/ ${WORKING_DIR}/phg_16.simg /tassel-5-standalone/run_pipeline.pl -Xmx20G -debug -configParameters imputevcfconfig.txt -ImputePipelinePlugin -imputeTarget map -localGVCFFolder /phg/inputDir/loadDB/gvcf/ -localGVCFDir /phg/inputDir/loadDB/gvcf/  -endPlugin > 08_VCF_Imputation.log

And the pipeline stops. Also, the pangenome folder at the outputDir/ is empty; and the vcfIndex file at the outputDir/ only contains the headers, and not any other information, which makes me wonder if there's any previous mistake that might be causing these problems.

I attach the logs for each step, the configure files, keys and the example gvcf in the next link:

https://drive.google.com/drive/folders/1s318N3OCQLm_okDLr5UIYjRED05jl7XK?usp=sharing

Any help or guidance would be much appreciated. If you need any other information to clarify what might be happening please let me know. Thank you!

phg • 490 views

ADD COMMENT • link updated 12 months ago by pjb39 ▴ 220 • written 12 months ago by sjp6181 • 0

score 1 · Answer 1 · 2023-11-16

The index file is empty because the consensus step collapsed everything to a single genome, probably reference, which means consensus haplotypes have no variants. This happened because mxDiv was too high for your dataset. My suggestion is to skip the consensus step. To do that for imputation, set pangenomeHaplotypeMethod and pathHaplotypeMethod to GATK_PIPELINE.