Hi everyone,
I run the latest phg version (0.0.23) in order to be able to use the fixed "diploidPath". To have my database with all updates from this last version, I run all previous steps to imputation again using the latest version. However, the LoadHaplotypesFromGVCFPlugin
step now fails, because of a unique constraint error. I have rerun the same code (with same config file, files and database) but instead phg:latest
I used phg:0.0.22
and the code runs smoothly without any errors. I wonder whether it is a problem of some upgrades in the code for uploading the haplotypes or it is something that I do wrong or any new parameter that I am not aware of.
The code that I run is the following:
docker run --name upload_haplotypes --rm -v ${WORKING_DIR}:/phg/ -t maizegenetics/phg:latest /tassel-5-standalone/run_pipeline.pl -Xmx16G -debug -configParameters ${DOCKER_CONFIG_FILE} -LoadHaplotypesFromGVCFPlugin -bedFile /phg/genome_sorted.windows.bed -endPlugin
This code fails with the following error:
[DefaultDispatcher-worker-1] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin -
Staging S1D1 chrom 11 for DB uploading.
[DefaultDispatcher-worker-1] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Time spent creating Sequences for Chr:14 for Line: S1D1 : 0.015435339sec
[DefaultDispatcher-worker-1] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Time spent creating GVCFSequence for Chr:14 for Line: S1D1 : 1.08508E-4sec
[DefaultDispatcher-worker-1] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Time spent creating Chr:14 for Line: S1D1 : 11.61738611sec
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to getVariantData : 15.218336869 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putVariantMappingData: total loaded to variant_mapping table: 3147
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putVariantMappingData: time to load variants : 0.490372271 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantMappingHash: before loading hash, size of all variants in variants table=3147
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantsHash query: select variant_id,chrom,position,ref_allele_id, alt_allele_id,anc_id from variants where chrom='10';
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantsHash: size after loading 3147
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putVariantMappingData: time to loadVariantsHash at end: 0.036291337 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to process/load variants data: 0.52678073 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypeData calling putHaploytpesForGamete
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Begin putHaplotypesForGamete, number anchorSequences to load: 144
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypes: starting to commit haplotypes
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaployptes - total count loaded to haplotypes table: 144
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to load haplotypes : 8.416172673 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypesData: Finished batch, total processed = 144
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putMethod: added method GATK_PIPELINE_PATH to methods table
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=4
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Paths added to db for S1D1, pathid=1
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Done DBProcessing Line: S1D1 Chr: 10
-------------------------------
Current Heap Size: 1,232 MB
Max Available Heap: 14564 MB
-------------------------------
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess - db is setup, init prepared statements, load hash table
beginning - isSqlite is true
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all geneotypes in genotype table=2
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - refRangeRefRangeIDMap is null, creating new one with size : 2411
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadAnchorHash: at end, size of refRangeRefRangeIDMap: 2411, number of rs.next processed: 2411
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=4
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in gamete_groups table=2
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all gametes in gametes table=2
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading readMappingHash, size of all read_mappings in read_mapping table=0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - DBProcessing Line: S1D1 Chr: 11
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Figuring out gamete Group Id for S1D1 chr:11
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Writing S1D1 chr:11 to the DB.
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypesData: time to load allel and variants hash: 9.04E-7 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to addToMissingAlleleList: 2.1975E-5 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: calling putAlleleData with size 0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putAlleleData: total loaded to alleles table: 0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:loadAlleleHash: added string NONE to alleles table
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to process/load allele data: 0.004023821 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: second pass, getVariantData
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to getVariantData : 5.1769E-5 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putVariantMappingData: total loaded to variant_mapping table: 0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putVariantMappingData: time to load variants : 1.397E-4 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantMappingHash: before loading hash, size of all variants in variants table=3147
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantsHash query: select variant_id,chrom,position,ref_allele_id, alt_allele_id,anc_id from variants where chrom='11';
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantsHash: size after loading 0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putVariantMappingData: time to loadVariantsHash at end: 2.9818E-4 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to process/load variants data: 6.21072E-4 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypeData calling putHaploytpesForGamete
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Begin putHaplotypesForGamete, number anchorSequences to load: 1
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypes: starting to commit haplotypes
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaployptes - total count loaded to haplotypes table: 1
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to load haplotypes : 0.262941075 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypesData: Finished batch, total processed = 1
SQLException 1
Code: 19
SqlState: null
Error Message: [SQLITE_CONSTRAINT] Abort due to constraint violation (UNIQUE constraint failed: paths.genoid, paths.method_id)
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Found Exception
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - S1D1 0 Chr: 11
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - java.lang.IllegalStateException: PHGdbAccess:putPathsData: SQLException: failed when adding paths for method: GATK_PIPELINE_PATH, taxon: S1D1
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Closing DB
It seems that the first chromosome '10'
is loaded properly, but the next chromosome '11'
fails.
If I run the following (v.0.0.22) I have no errors:
docker run --name upload_haplotypes --rm -v ${WORKING_DIR}:/phg/ -t maizegenetics/phg:0.0.22 /tassel-5-standalone/run_pipeline.pl -Xmx16G -debug -configParameters ${DOCKER_CONFIG_FILE} -LoadHaplotypesFromGVCFPlugin -bedFile /phg/genome_sorted.windows.bed -endPlugin
A second question is (in the case of being a bug in the new version): Does the fixed code in ImputePipelinePlugin
depends on the updated code of LoadHaplotypesFromGVCFPlugin
? Because a temporary alternative could be to run everything with the latest version except for the LoadHaplotypesFromGVCFPlugin
, which I would use the 0.0.22. However I do not know whether is there any substantial change in LoadHaplotypesFromGVCFPlugin that might affect the ImputePipelinePlugin result.
Thank you.