Hi all, I have a problem running the PHG ImputePipelinePlugin
plugin with inputType=vcf and imputeTarget=diploidPathToVCF. I'm using PHG version 0.026.
To be exact, I manage to run until SNPToReadMappingPlugin
(run within ImputePipelinePlugin workflow) but then at DiploidPathPlugin
I get the error of "KeyFile_pathKeyFile.txt doesn't exist". I believe that this file should be created automatically somewhere before the "DiploidPathPlugin" and I understand that I don't have to create it manually, as I don't know the readMappingIds (it is created automatically in the case of inputType=fastq).
I also realized that after the SNPToReadMappingPlugin
is completed, I get the files named _readMappings.txt
, but they are not imported into the database and I do not get any error on that step.
Here the log I get from ImputePipelinePlugin
(the two last plugins log):
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.SNPToReadMappingPlugin: time: Feb 8, 2021 12:03:25
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
SNPToReadMappingPlugin Parameters
keyFile: /phg/keyFile.txt
indexFile: /phg/outputDir/VCF_index_file.txt
vcfDir: /phg/inputDir/imputation/vcf/
outputDir: /phg/outputDir/
methodName: DIPLOID_VCF_HAP_COUNT_METHOD_mxDiv.0
methodDescription: test_description
countAlleleDepths: true
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.SNPToReadMappingPlugin - Processing record: A10,skimseqsFlowcell,A10.vcf,skimseqs
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.SNPToReadMappingPlugin - Processing record: A11,skimseqsFlowcell,A11.vcf,skimseqs
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.SNPToReadMappingPlugin - Processing record: A1,skimseqsFlowcell,A1.vcf,skimseqs
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.SNPToReadMappingPlugin: time: Feb 8, 2021 12:03:38
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Feb 8, 2021 12:03:38
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
HaplotypeGraphBuilderPlugin Parameters
configFile: /phg/config.txt
methods: CONSENSUS_mxDiv.0
includeSequences: false
includeVariantContexts: false
haplotypeIds: null
chromosomes: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/TestDBVCF.db host: localHost user: sqlite type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/TestDBVCF.db
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_ranges INNER JOIN ref_range_ref_range_method on ref_range_ref_range_method.ref_range_id=reference_ranges.ref_range_id INNER JOIN methods on ref_range_ref_range_method.method_id = methods.method_id AND methods.method_type = 7 ORDER BY reference_ranges.ref_range_id
methods size: 1
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 2411
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.01092466 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON gamete_haplotypes.gameteid = gametes.gameteid INNER JOIN genotypes on gametes.genoid = genotypes.genoid ORDER BY gamete_haplotypes.gamete_grp_id;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 3
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 2.19438E-4 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: CONSENSUS_mxDiv.0 range group method: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, asm_end_coordinate, genome_file_id, seq_hash, seq_len FROM haplotypes WHERE method_id = 5;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 4800
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 2400
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 0.480757947 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested number of nodes: 4800 number of reference ranges: 2400
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Feb 8, 2021 12:03:38
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.DiploidPathPlugin: time: Feb 8, 2021 12:03:38
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
DiploidPathPlugin Parameters
keyFile: /phg/keyFile_pathKeyFile.txt
readMethod: DIPLOID_VCF_HAP_COUNT_METHOD_mxDiv.0
pathMethod: DIPLOID_VCF_PATH_METHOD_mxDiv.0
pathMethodDescription: null
minTaxa: 1
probCorrect: 0.99
minTransition: 0.001
maxHap: 4
minReads: 0
removeEqual: false
maxReadsKB: 100
splitNodes: false
splitProb: 0.99
numThreads: 8
[pool-1-thread-1] ERROR net.maizegenetics.plugindef.AbstractPlugin - -keyFile: /phg/keyFile_pathKeyFile.txt doesn't exist
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
DiploidPathPlugin Description...
DiploidPathPlugin finds the best path through all ordered pair of nodes in a HaplotypeGraph given a key file with sample names and read mapping ids.
Usage:
DiploidPathPlugin <options>
-keyFile <Key File> : KeyFile file name. Must be a tab separated file using the following headers:
SampleName ReadMappingIds LikelyParents
ReadMappingIds and LikelyParents need to be comma separated for multiple values (required)
-readMethod <Read Mapping Method> : The name of the read mapping method in the PHG DB (required)
-pathMethod <Path Method> : The name of the path method used to write the results to the PHG DB (required)
-pathMethodDescription <Path Method Description> : An additional description that will be stored with the path method name, if desired.
-minTaxa <Min Taxa> : minimum number of taxa per anchor reference range. Ranges with fewer taxa will not be included in the output node list. (Default: 20)
-probCorrect <Probability Correct> : The probability that a read mapped to the correct haplotypes (Default: 0.99)
-minTransition <Min Transition Probability> : The minimum transition probability between a pair of nodes in adjacent reference ranges. (Default: 0.001)
-maxHap <Maximum Number of Haplotypes> : Any range with more than maxHap haplotypes will not be included in the path. (Default: 11)
-minReads <Minimum Read Number> : Any range with fewer than minReads will not be included in the path. (Default: 1)
-removeEqual <true | false> : Any range for which all haplotypes have the same number of read counts will not be included in the path. (Default: false)
-maxReadsKB <Maximum Reads per KB> : Any range with more than maxReadsKB reads per kilobase of sequence will not be included in the path. (Default: 1000)
-splitNodes <true | false> : If splitTaxa is true, then each taxon will be assigned its own node in the graph prior to path finding. (Default: false)
-splitProb <Split Probability> : When splitTaxa is true, the transition probability for moving between nodes of the same taxon will be set to this number. (Default: 0.99)
-numThreads <Num Threads> : Number of threads used to find paths. The path finding will subtract 2 from this number to have the number of worker threads. It leaves 1 thread for IO to the DB and 1 thread for the Operating System. (Default: 3)
The _readMappings.txt
files look fine to me (here just a head):
#line_name=A10
#file_group_name=skimseqsFlowcell
#method_name=DIPLOID_VCF_HAP_COUNT_METHOD_mxDiv.0
#method_description=test_description
HapIds count
11805,11806 9
11805 14
11806 8
10448 20
10447 62
But if I look into the database, there are no new genotypes added (as you can see in the log taxaListMap: number of taxa lists: 3
, those are the reference + two samples that were used to create the DB), nor the method named "DIPLOID_VCF_HAP_COUNT_METHOD_mxDiv.0". I'ts like it has just created the readMappings files but nothing is imported into the database and I don't see how to continue to get that done.
(Alternatively I tried to follow the documented workflow from: https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/ImputeWithPHG_VCF but after SNPToReadMappingPlugin
says to run FastqToMappingPlugin
however I don't have the Fastq files (reason of why one would run the inputType=vcf option), therefore I cannot continue with the workflow.)
Thanks!
Thank you for the fast answer, the function does the trick. Looking forward for the new version!