Impute haplotypes (ImputePipelinePlugin) execution error - PHG
1
0
Entering edit mode
18 months ago
jrodrigu • 0

Hi there,

I am trying to get my PHG imputation pipeline running. I use a small reference panel and a large low-coverage nested population to be imputed. The first part of the pipeline run properly, creating the pangenome.fa, aligning the read using minimap2, but after it made the graph, an error message appears:

[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createEdges: creating edges from nodes. [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createEdges: time:1.35681E-4 secs.  [pool-1-thread-1] DEBUG net.maizegenetics.plugindef.AbstractPlugin - bound must be positive.

Then it points to the DiplodPathPlugin and finalises reporting the same "bounds must be positive" error. Also, it doesn't write any output. I checked the key file for the proper setup, and it's ok.

Could you please give me some advice to overcome this error? Thanks a lot!

phg • 1.1k views
ADD COMMENT
0
Entering edit mode
18 months ago
lcj34 ▴ 420

Can you tell us (1) what version of the PHG you are running, and (2), from the log file above where the "bound must be positive" error is printed, how many nodes does it show are in the graph?

If there are 0, we need to determine why. Look for lines that contain the verbiage "addNodes: number of nodes: " and "addNodes: number of reference ranges: "

ADD COMMENT
0
Entering edit mode

First, thanks for the quick replay. So, answering your questions. (1) PHG version 1.4 (2) There are 37 nodes and 9 of reference ranges. However, this appends one step before and then, the error occurs. please check the lines:

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - CreateGraphUtils:addNodes - query=SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, asm_end_coordinate, asm_strand, genome_file_id, seq_hash, seq_len FROM haplotypes WHERE method_id = 12;

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 36

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 9

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 2.07973551 secs.

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested  number of nodes: 36  number of reference ranges: 9

> [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: May 18, 2023 11:34:28

> [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.DiploidPathPlugin: time: May 18, 2023 11:34:28

> [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - DiploidPathPlugin Parameters

> keyFile: /phg/key_file_pathKeyFile.txt

> readMethod: GATK

> pathMethod: GATK

> pathMethodDescription: null

> minTaxa: 20

> probCorrect: 0.99

> minTransition: 0.001

> maxHap: 11

> minReads: 1

> removeEqual: false

> maxReadsKB: 1000

> splitNodes: false

> splitProb: 0.99

> numThreads: 3

> classicAlgorithm: false

> inbreedCoef: 0.0

> maxParents: 2147483647

> minCoverage: 1.0

> parentOutputFile: null

> isTestMethod: false

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/phg_db_name.db host: 127.0.0.1 user: sqlite type: sqlite

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/phg_db_name.db

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess - db is setup, init prepared statements, load hash table

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess -
 beginning - isSqlite is true

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all geneotypes in genotype table=5

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - refRangeRefRangeIDMap is null, creating new one with size : 9

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadAnchorHash: at end, size of refRangeRefRangeIDMap: 9, number of rs.next processed: 9
> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=13

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in taxa_groups table=0

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in gamete_groups table=5

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all gametes in gametes table=5

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createEdges: creating edges from nodes.

> [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createEdges: time: 7.1723E-5 secs.

> [pool-1-thread-1] DEBUG net.maizegenetics.plugindef.AbstractPlugin - bound must be positive
ADD REPLY
0
Entering edit mode

Will you send me both your commands and your full log file? lcj34@cornell.edu

ADD REPLY
0
Entering edit mode

Sure. The command is:

singularity exec -B $PWD:/phg/ /ei/software/testing/phg/1.4/phg.img /tassel-5-standalone/run_pipeline.pl -debug -Xmx20G -configParameters config.txt -ImputePipelinePlugin -imputeTarget diploidPath -inputType fastq -configFile config_haplotype_imputation.txt  -pangenomeHaplotypeMethod GATK -readMethod GATK -localGVCFFolder inputDir/loadDB/gvcf/ -pangenomeDir outputDir/pangenome/ -endPlugin

And the log file is this one: https://github.com/mjrodriguezc/questions/blob/main/log_file_phg.txt

ADD REPLY
0
Entering edit mode

The log file shows an error processing the key file in the FastqToReadMappingPlugin step. See the lines that contain "ERROR net.maizegenetics.pangenome.hapCalling.Minimap2Utils - input directory does not contain both of .... "

Check your keyfile and run imputation again.

Additionally, your minTaxa value should be equal to or less than the number of taxa you have. It looks like you only have 5 taxa, but your minTaxa value is 20. "minTaxa" should be set to 5 or less. This value represents the minimum number of taxa per anchor reference range. When creating the path, ranges with fewer taxa will not be included in the output node list.

ADD REPLY
0
Entering edit mode

Thank you so much. I made the suggested changes. However, I still have the same problem. Please check the new log file.

https://github.com/mjrodriguezc/questions/blob/main/log_phg_imputation_2.txt

Note: I am using two samples as a test; nevertheless, the error is the same when I use the whole progeny.

ADD REPLY

Login before adding your answer.

Traffic: 2394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6