Here, presented my PHG scripts, config, wgs_keyfile.
1. Create valid intervals
docker run --name test_assemblies --rm -v /DATA/jysong/PHG/ver1.0_phg/:/phg/ -t maizegenetics/phg:1.0 /tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters /phg/Masterconfig.txt -CreateValidIntervalsFilePlugin -intervalsFile /phg/inputDir/reference/glyma.Wm82.gnm4.ann1.T8TQ.gene_models_main.bed -referenceFasta /phg/inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fixed.fna.gz -mergeOverlaps true -generatedFile /phg/validBedFile.bed -endPlugin &> Log/1.Create_validinterval.txt &
2. Create initial DB
docker run --name create_initial_db --rm -v /DATA/jysong/PHG/ver1.0_phg/:/phg/ -t maizegenetics/phg:1.0 /tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters /phg/Masterconfig.txt -MakeInitialPHGDBPipelinePlugin -endPlugin &> Log/2.Create_InitialDB.txt &
3. check plugin update
docker run --name create_directory --rm -v /DATA/jysong/PHG/ver1.0_phg/:/phg/ -t maizegenetics/phg:1.0 /tassel-5-standalone/run_pipeline.pl -debug -configParameters /phg/Masterconfig.txt -CheckDBVersionPlugin -outputDir /phg/outputDir -endPlugin -LiquibaseUpdatePlugin -outputDir /phg/outputDir -endPlugin &> Log/3.Check_Plugin.txt &
4. Load haplotype (In docker)
./CreateHaplotypesFromFastq.groovy -config phg/Masterconfig.txt &> phg/Log/4.CreateHAplotypeFromFastq.txt &
5. Create consensus (In docker)
/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters Masterconfig.txt -HaplotypeGraphBuilderPlugin -configFile Masterconfig.txt -methods GATK_PIPELINE1:CONSENSUS -includeVariantContexts true -endPlugin -RunHapConsensusPipelinePlugin -referenceFasta /inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fixed.fna.gz -dbConfigFile Masterconfig.txt -collapseMethod CONSENSUS -collapseMethodDetails G.max_test -rankingFile rankingFile.txt -clusteringMode kmer_assembly -isTestMethod true -endPlugin &> phg/Log/5.Concensus.txt &
Config file
### config file.
### Anything marked with UNASSIGNED needs to be set for at least one of the steps
### If it is marked as OPTIONAL, it will only need to be set if you want to run specific steps.
host=localHost
user=sqlite
password=sqlite
DB=/phg/Gmax430
DBtype=sqlite
# Load genome intervals parameters
referenceFasta=/phg/inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fna.gz
anchors=/phg/validBedFile.bed
genomeData=/phg/inputDir/reference/load_genome_data.txt
refServerPath=localHost;/DATA/jysong/ver1.0_phg/ref
#liquibase results output directory, general output directory
outputDir=/phg/outputDir
liquibaseOutdir=/phg/outputDir
### Align WGS fastq files to reference genome parameters
# File Directories
gvcfFileDir=/phg/inputDir/loadDB/gvcf/
tempFileDir=/phg/inputDir/loadDB/temp/
filteredBamDir=/phg/inputDir/loadDB/bam/filteredBAMs/
dedupedBamDir=/phg/inputDir/loadDB/bam/DedupBAMs/
# TASSEL parameters
Xmx=100G
tasselLocation=/tassel-5-standalone/run_pipeline.pl
# PHG CreateHaplotypes Parameters
wgsKeyFile=/phg/wgs_KeyFile.txt
LoadHaplotypesFromGVCFPlugin.gvcfDir=/phg/inputDir/loadDB/gvcf/
LoadHaplotypesFromGVCFPlugin.referenceFasta=/phg/inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fna.gz
LoadHaplotypesFromGVCFPlugin.haplotypeMethodName=GATK_PIPELINE
LoadHaplotypesFromGVCFPlugin.haplotypeMethodDescription=GATK_PIPELINE
extendedWindowSize = 1000
mapQ = 48
# GATK and Sentieon Parameters
gatkPath = /gatk/gatk
numThreads=35
sentieon_license
sentieonPath=/sentieon/bin/sentieon
# CreateConsensi parameters
haplotypeMethod = GARK_PIPELINE
consensusMethod = CONSENSUS
mxDiv = 0.005
seqErr = 0.02
minSites = 20
minTaxa = 2
maxThreads = 60
#rankingFile = null
#clusteringMode = upgma
# Graph Building Parameters
includeVariants = true
#FilterGVCF Parameters. Adding any of these will add more filters.#exclusionString=**UNASSIGNED**
#DP_poisson_min=0.0
#DP_poisson_max=1.0
#DP_min=100
#DP_max=**UNASSIGNED**
#GQ_min=10
#GQ_max=**UNASSIGNED**
#QUAL_min=30
#QUAL_max=**UNASSIGNED**
#filterHets=**UNASSIGNED**
# Imputation Pipeline parameters for VCF files
#--- Used by liquibase to check DB version ---
liquibaseOutdir=/phg/outputDir
#--- Used for indexing SNP positions ---
# pangenomeHaplotypeMethod is the database method or methods for the haplotypes to which SNPs will be indexed
# the index file lists the SNP allele to haplotype mapping and is used for mapping reads
pangenomeHaplotypeMethod=CONSENSUS
pangenomeDir=/phg/outputDir/pangenome
indexFile=/phg/outputDir/vcfIndexFile
vcfFile=/phg/inputDir/imputation/vcf/SoyHapMap.SNP.GT.fixed.vcf.414accession.KASP.gz
#--- Used for mapping reads
# readMethod is the method name for storing the resulting read mappings
# countAlleleDepths=true means allele depths will be used for haplotype counts, which is almost always a good choice
inputType=vcf
keyFile=/phg/readMapping_key_file.txt
readMethod=GBD_readMethod
vcfDir=/phg/inputDir/loadDB/gvcf/
countAlleleDepths=true
#--- Used for path finding
# pathHaplotypeMethod determines which haplotypes will be consider for path finding
# pathHaplotypeMethod should be the same as pangenomeHaplotypeMethod, but could be a subset
# pathMethod is the method name used for storing the paths
pathHaplotypeMethod=CONSENSUS
pathMethod=GBD_pathMethod
maxNodes=1000
maxReads=10000
minReads=1
minTaxa=20
minTransitionProb=0.001
numThreads=3
probCorrect=0.99
removeEqual=true
splitNodes=true
splitProb=0.99
usebf=false
maxParents = 1000000
minCoverage = 1.0
#parentOutputFile = **OPTIONAL**
#--- used by haploid path finding only
# usebf - if true use Forward-Backward algorithm, other Viterbi
usebf=false
minP=0.8
#--- used by diploid path finding only
maxHap=11
maxReadsKB=100
algorithmType=efficient
#--- Used to output a vcf file for pathMethod
outVcfFile=/phg/outputDir/Result.vcf
#~~~ Optional Parameters ~~~
#readMethodDescription=**OPTIONAL**
#pathMethodDescription=**OPTIONAL**
#bfInfoFile=**OPTIONAL**
#~~~ providing a value for outputDir will write read mappings to file rather than the PHG db ~~~
outputDir=/phg/
wgs_keyfile
The output file of the filtered gvcf file did not show any sequences. And in create consensus step, sever;/path/to/file error occurred.
After changing the colon of gvcfserverpath to a semicolon, executing CreateFromGVCF.groovy and proceeding with the create Consensus step, the same error occurred. Is there something wrong with the config file or keyfile?
If using CreateHaplotypesFromGVCF.groovy after CreateHaplotypeFromFastq.groovy, the DB I thought it would be updated with the changed gvcfSeverPath in wgs_keyfile.
If running it with the command line in step 4-5, the program of samtools or bwa can be checked in process (ps -a), but There was a case where the activation of the core was lost, so I ran it inside Docker.
Thank you for your reply.
Create a new DB and try again.
Among the commands of the Create Consensus haplotype step, which parameters should be entered in the method of -method method1:method2?