Hi,
I'm trying to run the ImputewithPHG_main step to infer genotypes from fastq files. The underlying plugins - LiquibaseUpdatePlugin, FastqtoMappingPlugin - seem to have completed without any errors.
Memory Settings: -Xms512m -Xmx215040m
Tassel Pipeline Arguments: -debug -configParameters /phg/config_Imputation_fq.txt -ImputePipelinePlugin -imputeTarget pathToVCF -endPlugin
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.pipeline.ImputePipelinePlugin - PHG DB is up to date. Proceeding with Populating the PHG DB.
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 27, 2021 4:32:50
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin -
HaplotypeGraphBuilderPlugin Parameters
configFile: /phg/config_Imputation_fq.txt
methods: GATK_PIPELINE
includeSequences: true
includeVariantContexts: false
haplotypeIds: null
chromosomes: null
taxa: null
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt3.db host: localHost user: sqlite type: sqlite
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt3.db
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_ranges INNER JOIN ref_range_ref_range_method on ref_range_ref_range_method.ref_range_id=reference_ranges.ref_range_id INNER JOIN methods on ref_range_ref_range_method.method_id = methods.method_id AND methods.method_type = 7 ORDER BY reference_ranges.ref_range_id
methods size: 1
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 94229
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.674000525 secs.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON gamete_haplotypes.gameteid = gametes.gameteid INNER JOIN genotypes on gametes.genoid = genotypes.genoid ORDER BY gamete_haplotypes.gamete_grp_id;
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 95
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.074377629 secs.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: GATK_PIPELINE range group method: null
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, asm_end_coordinate, genome_file_id, sequence, seq_hash, seq_len FROM haplotypes WHERE method_id = 3;
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 8857526
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 94229
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 445.018472069 secs.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested number of nodes: 8857526 number of reference ranges: 94229
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 27, 2021 4:40:21
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.FastqToMappingPlugin: time: Jul 27, 2021 4:40:21
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin -
FastqToMappingPlugin Parameters
minimap2IndexFile: /phg/outputDir/pangenome/pangenome_GATK_PIPELINE_k21w11I90G.mmi
keyFile: /phg/readMapping_key_file_fq.txt
fastqDir: /phg/inputDir/imputation/fastq/
maxRefRangeErr: 0.25
lowMemMode: true
maxSecondary: 20
fParameter: f1000,5000
minimapLocation: minimap2
methodName: readMethodwithfastqfileall
methodDescription: readMethod+allfastqfilesused
debugDir:
outputSecondaryStats: false
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt3.db host: localHost user: sqlite type: sqlite
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt3.db
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess - db is setup, init prepared statements, load hash table
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess -
beginning - isSqlite is true
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all geneotypes in genotype table=96
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - refRangeRefRangeIDMap is null, creating new one with size : 94229
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadAnchorHash: at end, size of refRangeRefRangeIDMap: 94229, number of rs.next processed: 94229
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=9
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in gamete_groups table=95
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all gametes in gametes table=95
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading readMappingHash, size of all read_mappings in read_mapping table=69
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Skipping Keyfile entry: cultivar 17C23-2, flowcell_lane wgs has
already been processed and loaded into the DB.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Skipping Keyfile entry: cultivar ADVANCE, flowcell_lane wgs has already been processed and loaded into the DB.
...
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Setting up MinimapRun for: cultivar MT1731, flowcell_lane wgs.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Running Minimap2 Command:
minimap2 -ax sr -t 70 --secondary=yes -N20 -f1000,5000 --eqx /phg/outputDir/pangenome/pangenome_GATK_PIPELINE_k21w11I90G.mmi /phg/inputDir/imputation/fastq/MT1731.fq
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Time spent setting up run: Taxon:MT1731 : 155.75922857sec
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Running in Low memory mode. Simply counting the number of reads which hit a given set of haplotype ids
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - TotalTiming For samReader.next:2.6597614309999984
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Time spent processing SAM: Taxon:MT1731 : 2.880379927sec
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Done running minimap2 for: cultivar MT1731, flowcell_lane wgs
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Compressing ReadMappings into GZipped Json.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Done Compressing ReadMappings. Pushing it to the DB.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Done processing cultivar MT1731, flowcell_lane wgs. Moving on to next keyfile Entry.
...
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Closing DB
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.FastqToMappingPlugin: time: Jul 27, 2021 6:39:17
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 27, 2021 6:39:17
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin -
HaplotypeGraphBuilderPlugin Parameters
configFile: /phg/config_Imputation_fq.txt
methods: GATK_PIPELINE
includeSequences: false
includeVariantContexts: false
haplotypeIds: null
chromosomes: null
taxa: null
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt3.db host: localHost user: sqlite type: sqlite
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt3.db
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:
...
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jul 27, 2021 8:26:28
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin -
PathsToVCFPlugin Parameters
outputFile: outputVcfFile.vcf
refRangeFileVCF: null
referenceFasta: null
makeDiploid: true
positions: null
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of ranges: 94229
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of taxa: 94
\[pool-1-thread-1\] DEBUG net.maizegenetics.plugindef.AbstractPlugin - Allele in genotype G not in the variant context \[G, G\]
java.lang.IllegalStateException: Allele in genotype G not in the variant context \[G, G\]
at htsjdk.variant.variantcontext.VariantContext$Validation.validateGenotypes(VariantContext.java:382)
at htsjdk.variant.variantcontext.VariantContext$Validation.access$200(VariantContext.java:323)
at htsjdk.variant.variantcontext.VariantContext$Validation$2.validate(VariantContext.java:331)
at htsjdk.variant.variantcontext.VariantContext.lambda$validate$0(VariantContext.java:1384)
at java.lang.Iterable.forEach(Iterable.java:75)
at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1384)
at htsjdk.variant.variantcontext.VariantContext.<init>(VariantContext.java:489)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:647)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:638)
at net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin.createVariantContext(PathsToVCFPlugin.kt:342)
at net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin.variantContexts(PathsToVCFPlugin.kt:475)
at net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin.access$variantContexts(PathsToVCFPlugin.kt:53)
at net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin$infosByRange$2$invokeSuspend$$inlined$forEach$lambda$1.invokeSuspend(PathsToVCFPlugin.kt:216)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin -
Usage:
PathsToVCFPlugin <options>
-outputFile <Output VCF File Name> : Output file name (required)
-refRangeFileVCF <Reference Range File> : Reference Range file used to subset the paths for only specified regions of the genome.
-referenceFasta <Reference Genome> : Reference Genome.
-makeDiploid <true | false> : Whether to report haploid paths as homozygousdiploid (Default: true)
-positions <Position List> : Positions to include in VCF. Can be specified by Genotype file (i.e. VCF, Hapmap, etc.), bed file, or json file containing the requested positions.
\[pool-1-thread-1\] ERROR net.maizegenetics.plugindef.AbstractPlugin - Allele in genotype G not in the variant context \[G, G\]
I'm not sure about the PathstoVCFPlugin error that I'm getting: "Alllele in genotype G not in the variant context [G,G]. Tried looking at the source code to see what could have possibly thrown out the error, but couldn't find one. Any help would be greatly appreciate
Peter: Here is the log file for the ImputePipelinePlugin using a fastq file;
ImputePipelinePlugin_fastq.log
The plugin actually worked to give me a vcf file. But the VCF file contains around 80 SNPs from Chromosome 1A of the input fastq file. Again, I can't make sense of the error based on the source code. Could you please take a look at it?
bp : Currently this paste says that it is either marked private or is awaiting moderation. If it is the latter then check back to make sure it is visible for others later in the day.
I did change the first link. On my end, the current link takes me to the pastebin post.
Sorry for the inconvenience.
Just want to make sure the developer can see it. I am still getting the following error with link above.
I guess it is pending moderation at the moment. It is shared publicly, but thanks for checking! I guess I'd update once the moderation is complete.
Looks like the paste is now public.