Hello all!
I am trying to use the Practical Haplotype Graph to create a new PHG database and use it later on. I am using PHG version 1.2. Currently I am stuck at the second step, MakeInitialPHGDBPipelinePlugin. Looking at the -debug output GetDBConnectionPlugin completes successfully and the first few steps of LoadAllIntervalsToPHGdbPlugin aswell. However, once the GVCF file is supposed to be indexed and error occurs.
Last successful operation:
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.LoadAllIntervalsToPHGdbPlugin - writeRefRangeRefRangeMethodTable finished
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.LoadAllIntervalsToPHGdbPlugin - createLoadREfRanges: calling putRefAnchorData, hapMethodId= 1 size of anchorsToLoad 422710
Followed by the error:
Dec 19, 2022 12:09:19 PM net.maizegenetics.pangenome.db_loading.VariantLoadingUtilsKt bgzipAndIndexGVCFfile
INFO: bgzipping file /xxx/projects/P003_PHG_genome_build/data/PHG/inputDir/reference/ref.gvcf
Dec 19, 2022 12:09:20 PM net.maizegenetics.pangenome.db_loading.VariantLoadingUtilsKt bgzipAndIndexGVCFfile
WARNING:
ERROR 1 creating tabix indexed version of file: /xxx/projects/P003_PHG_genome_build/data/PHG/inputDir/reference/ref.gvcf.gz
[pool-1-thread-1] DEBUG net.maizegenetics.plugindef.AbstractPlugin - LoadAllIntervalsToPHGdbPlugin : error processing/loading intervals bgzipAndIndexGVCFfile: error bgzipping and/or tabix'ing file /xxx/projects/P003_PHG_genome_build/data/PHG/inputDir/reference/ref.gvcf
java.lang.IllegalArgumentException: LoadAllIntervalsToPHGdbPlugin : error processing/loading intervals bgzipAndIndexGVCFfile: error bgzipping and/or tabix'ing file /xxx/projects/P003_PHG_genome_build/data/PHG/inputDir/reference/ref.gvcf
at net.maizegenetics.pangenome.db_loading.LoadAllIntervalsToPHGdbPlugin.createLoadRefRanges(LoadAllIntervalsToPHGdbPlugin.kt:346)
at net.maizegenetics.pangenome.db_loading.LoadAllIntervalsToPHGdbPlugin.processData(LoadAllIntervalsToPHGdbPlugin.kt:170)
at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:111)
at net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin.loadGenomeIntervals(MakeInitialPHGDBPipelinePlugin.kt:83)
at net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin.processData(MakeInitialPHGDBPipelinePlugin.kt:36)
at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:111)
at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:2017)
at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
Usage:
LoadAllIntervalsToPHGdbPlugin <options>
-ref <Reference Genome File> : Referemce Genome File for aligning against (required)
-anchors <Anchors File> : Tab-delimited file containing Chrom, StartPosition, EndPosition, Type (required)
-genomeData <Genome Data File> : Path to tab-delimited file containing genome specific data with header line:
Genotype Hapnumber Dataline Ploidy Reference GenePhased ChromPhased Confidence Method MethodDetails gvcfServerPath
The gvcfServerPath column should hold a semi-colon separated servername and path where gvcf files will be uploaded, e.g. 128.9.9.9;/path/to/gvcfs/ (required)
-outputDir <Output Directory> : Directory to write liquibase changeLogSync output (required)
-refServerPath <Reference Server Path> : String that contains a server name or ip address, followed by a semi-colon, then the file path where the reference genome will be stored for future access. This ia a more permanent location, not where the genome file lives for processing via this plugin. (required)
-isTestMethod <true | false> : Indication if the data is to be loaded against a test method. Data loaded with test methods are not cached with the PHG ktor server (Default: false)
[pool-1-thread-1] ERROR net.maizegenetics.plugindef.AbstractPlugin - LoadAllIntervalsToPHGdbPlugin : error processing/loading intervals bgzipAndIndexGVCFfile: error bgzipping and/or tabix'ing file /xxx/projects/P003_PHG_genome_build/data/PHG/inputDir/reference/ref.gvcf
[pool-1-thread-1] INFO net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin - Done loading Genome Intervals step.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin - Checking if Liquibase can be run.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin - Liquibase can be run. Setting it up using changelogsync.
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.liquibase.LiquibaseUpdatePlugin: time: Dec 19, 2022 12:09:20
[pool-1-thread-1] ERROR net.maizegenetics.plugindef.AbstractPlugin - -outputDir is required.
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
Usage:
LiquibaseUpdatePlugin <options>
-outputDir <Output Directory> : Directory path to write any liquibase output files. (required)
-command <Liquibase command> : Command for liquibase to execute: must be update or changeLogSync, defaults to update. (Default: update)
I have so far unsuccessfully tried to find the bgzipAndIndexGVCFfile() function in the bitbucket repository, but I would guess that it tries to create a tabix index for the GVCF. The trouble is that my genome(s) are larger than the ~500Mbp cap for tabix style indexing and would have to be indexed in the csi style.
Is this supported with PHG? Does anyone have any experiences with this?
Many thanks in advance!!
Thank you very much for the quick answer! I tried this with splitting the chromosomes and has worked for the steps of the pipeline I have tried so far. I will also look out for future updates.
Thank you again.