ImputewithPHG_main:fastq
1
0
Entering edit mode
3.3 years ago
bp • 0

Hi,

I'm trying to run the ImputewithPHG_main step to infer genotypes from fastq files. The underlying plugins - LiquibaseUpdatePlugin, FastqtoMappingPlugin - seem to have completed without any errors.

                    Memory Settings: -Xms512m -Xmx215040m
                    Tassel Pipeline Arguments: -debug -configParameters /phg/config_Imputation_fq.txt -ImputePipelinePlugin -imputeTarget pathToVCF -endPlugin
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.pipeline.ImputePipelinePlugin - PHG DB is up to date.  Proceeding with Populating the PHG DB.
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 27, 2021 4:32:50
\[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - 
HaplotypeGraphBuilderPlugin Parameters
configFile: /phg/config_Imputation_fq.txt
methods: GATK_PIPELINE
includeSequences: true
includeVariantContexts: false
haplotypeIds: null
chromosomes: null
taxa: null

\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt3.db host: localHost user: sqlite type: sqlite
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt3.db
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:  

\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_ranges  INNER JOIN ref_range_ref_range_method on ref_range_ref_range_method.ref_range_id=reference_ranges.ref_range_id  INNER JOIN methods on ref_range_ref_range_method.method_id = methods.method_id  AND methods.method_type = 7 ORDER BY reference_ranges.ref_range_id
methods size: 1
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 94229
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.674000525 secs.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON gamete_haplotypes.gameteid = gametes.gameteid INNER JOIN genotypes on gametes.genoid = genotypes.genoid ORDER BY gamete_haplotypes.gamete_grp_id;
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 95
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.074377629 secs.
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: GATK_PIPELINE range group method: null
                    \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, asm_end_coordinate, genome_file_id, sequence, seq_hash, seq_len FROM haplotypes WHERE method_id = 3;
\[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 8857526
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 94229
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 445.018472069 secs.
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested  number of nodes: 8857526  number of reference ranges: 94229
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 27, 2021 4:40:21
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.FastqToMappingPlugin: time: Jul 27, 2021 4:40:21
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - 
 FastqToMappingPlugin Parameters
 minimap2IndexFile: /phg/outputDir/pangenome/pangenome_GATK_PIPELINE_k21w11I90G.mmi
 keyFile: /phg/readMapping_key_file_fq.txt
 fastqDir: /phg/inputDir/imputation/fastq/
 maxRefRangeErr: 0.25
 lowMemMode: true
 maxSecondary: 20
 fParameter: f1000,5000
 minimapLocation: minimap2
 methodName: readMethodwithfastqfileall
 methodDescription: readMethod+allfastqfilesused
 debugDir: 
 outputSecondaryStats: false

 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt3.db host: localHost user: sqlite type: sqlite
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt3.db
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:  

 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess - db is setup, init prepared statements, load hash table
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - 
 beginning - isSqlite is true
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all geneotypes in genotype table=96
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - refRangeRefRangeIDMap is null, creating new one with size : 94229
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadAnchorHash: at end, size of refRangeRefRangeIDMap: 94229, number of rs.next processed: 94229
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=9
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in gamete_groups table=95
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all gametes in gametes table=95
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading readMappingHash, size of all read_mappings in read_mapping table=69
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Skipping Keyfile entry: cultivar 17C23-2, flowcell_lane wgs has 
   already been processed and loaded into the DB.
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Skipping Keyfile entry: cultivar ADVANCE, flowcell_lane wgs has already been processed and loaded into the DB.
                      ...

 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Setting up MinimapRun for: cultivar MT1731, flowcell_lane wgs.
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Running Minimap2 Command:
 minimap2 -ax sr -t 70 --secondary=yes -N20 -f1000,5000 --eqx /phg/outputDir/pangenome/pangenome_GATK_PIPELINE_k21w11I90G.mmi /phg/inputDir/imputation/fastq/MT1731.fq
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Time spent setting up run: Taxon:MT1731 : 155.75922857sec
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Running in Low memory mode.  Simply counting the number of reads which hit a given set of haplotype ids
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - TotalTiming For samReader.next:2.6597614309999984
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Time spent processing SAM: Taxon:MT1731 : 2.880379927sec
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Done running minimap2 for: cultivar MT1731, flowcell_lane wgs
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Compressing ReadMappings into GZipped Json.
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Done Compressing ReadMappings.  Pushing it to the DB.
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.Minimap2Utils - Done processing cultivar MT1731, flowcell_lane wgs.  Moving on to next keyfile Entry.
                           ...
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Closing DB
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.FastqToMappingPlugin: time: Jul 27, 2021 6:39:17
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 27, 2021 6:39:17
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - 
 HaplotypeGraphBuilderPlugin Parameters
 configFile: /phg/config_Imputation_fq.txt
 methods: GATK_PIPELINE
 includeSequences: false
 includeVariantContexts: false
 haplotypeIds: null
 chromosomes: null
 taxa: null

 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt3.db host: localHost user: sqlite type: sqlite
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt3.db
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:  
 ...
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jul 27, 2021 8:26:28
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - 
 PathsToVCFPlugin Parameters
 outputFile: outputVcfFile.vcf
 refRangeFileVCF: null
 referenceFasta: null
 makeDiploid: true
 positions: null

 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of ranges: 94229
 \[pool-1-thread-1\] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of taxa: 94
 \[pool-1-thread-1\] DEBUG net.maizegenetics.plugindef.AbstractPlugin - Allele in genotype G not in the variant context \[G, G\]
 java.lang.IllegalStateException: Allele in genotype G not in the variant context \[G, G\]
 at htsjdk.variant.variantcontext.VariantContext$Validation.validateGenotypes(VariantContext.java:382)
 at htsjdk.variant.variantcontext.VariantContext$Validation.access$200(VariantContext.java:323)
 at htsjdk.variant.variantcontext.VariantContext$Validation$2.validate(VariantContext.java:331)
 at htsjdk.variant.variantcontext.VariantContext.lambda$validate$0(VariantContext.java:1384)
 at java.lang.Iterable.forEach(Iterable.java:75)
 at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1384)
 at htsjdk.variant.variantcontext.VariantContext.<init>(VariantContext.java:489)
 at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:647)
 at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:638)
 at net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin.createVariantContext(PathsToVCFPlugin.kt:342)
 at net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin.variantContexts(PathsToVCFPlugin.kt:475)
 at net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin.access$variantContexts(PathsToVCFPlugin.kt:53)
 at net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin$infosByRange$2$invokeSuspend$$inlined$forEach$lambda$1.invokeSuspend(PathsToVCFPlugin.kt:216)               
 at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
 at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
 at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
 at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
 at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)
 \[pool-1-thread-1\] INFO net.maizegenetics.plugindef.AbstractPlugin - 
 Usage:
 PathsToVCFPlugin <options>
 -outputFile <Output VCF File Name> : Output file name (required)
  -refRangeFileVCF <Reference Range File> : Reference Range file used to subset the paths for only specified regions of the genome.
  -referenceFasta <Reference Genome> : Reference Genome.
  -makeDiploid <true | false> : Whether to report haploid paths as homozygousdiploid (Default: true)
  -positions <Position List> : Positions to include in VCF. Can be specified by Genotype file (i.e. VCF, Hapmap, etc.), bed file, or json file containing the requested positions.

  \[pool-1-thread-1\] ERROR net.maizegenetics.plugindef.AbstractPlugin - Allele in genotype G not in the variant context \[G, G\]

I'm not sure about the PathstoVCFPlugin error that I'm getting: "Alllele in genotype G not in the variant context [G,G]. Tried looking at the source code to see what could have possibly thrown out the error, but couldn't find one. Any help would be greatly appreciate

PHG • 1.7k views
ADD COMMENT
0
Entering edit mode
3.3 years ago
pjb39 ▴ 220

This is the result of a bug that we have not tracked down yet. So, do not feel bad about not being able to spot the problem in the source code. It has to do with the way the VariantContext (which is part of the htsjdk library) is being constructed. Fortunately this means that the pipeline ran successfully up to the final step, which is exporting the imputed VCF file. You might try dropping back to docker version 0.0.28 to write the VCF file.

ADD COMMENT
0
Entering edit mode

Peter: Here is the log file for the ImputePipelinePlugin using a fastq file;

ImputePipelinePlugin_fastq.log

The plugin actually worked to give me a vcf file. But the VCF file contains around 80 SNPs from Chromosome 1A of the input fastq file. Again, I can't make sense of the error based on the source code. Could you please take a look at it?

ADD REPLY
0
Entering edit mode

bp : Currently this paste says that it is either marked private or is awaiting moderation. If it is the latter then check back to make sure it is visible for others later in the day.

ADD REPLY
0
Entering edit mode

I did change the first link. On my end, the current link takes me to the pastebin post.

Sorry for the inconvenience.

ADD REPLY
0
Entering edit mode

Just want to make sure the developer can see it. I am still getting the following error with link above. enter image description here

ADD REPLY
0
Entering edit mode

I guess it is pending moderation at the moment. It is shared publicly, but thanks for checking! I guess I'd update once the moderation is complete.

ADD REPLY
0
Entering edit mode

Looks like the paste is now public.

ADD REPLY

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6