I use RIL as test, the parental and maternal genome as reference panel ,alignment by AnchorWave as input PHG. and five indiviuals that are low GBS and WGS as imputation.
I have imputed a vcf file by the PHG, the five indiviuals have same imputaiton result at same sites. When I modify some parameters about at constuct pangenome and after, the final the result is same. and When I change the parameterHaplotypeGraphBuilderPlugin.taxa=Tt_1A_part1 added '#' and delete LoadHaplotypesFromGVCFPlugin.mergeRefBlocks=true, the parental and maternal have same varition. I think that all the problems from about HaplotypeGraphBuilderPlugin step, the helper document about config is lost and don't know internal progress, could you give me some advice?
it weired that when I run MakeInitialPHGDBPipelinePlugin, producing a vcf and tbi index file at './input/reference/', corresponding to produced by bed file.
mkdir liquibase_dir
cd liquibase_dir
wget -O liquibase-4.7.0.tar.gz "https://github.com/liquibase/liquibase/releases/download/v4.7.0/liquibase-4.7.0.tar.gz"
tar -xzf liquibase-4.7.0.tar.gz
rm liquibase-4.7.0.tar.gz
mkdir ./changelogs
mkdir ./changelogs/changesets
cd ..
export PATH=/media/xudong/14t1/phg8/liquibase_dir:${PATH}
#cp -r ../changelogs ./
ln -s ./liquibase_dir/liquibase ./
perl /home/xudong/download/tasseladmin-tassel-5-standalone-846381e171c8/run_pipeline.pl -debug -MakeDefaultDirectoryPlugin -workingDir /media/xudong/14t1/phg9 -endPlugin > 1.1log
# copy fasta into reference and assemblies, including two fasta file, including ref and query reference
# copy bed file into dir
perl /home/xudong/download/tasseladmin-tassel-5-standalone-846381e171c8/run_pipeline.pl -Xmx100G -debug -configParameters config1.2.txt -MakeInitialPHGDBPipelinePlugin -endPlugin > 1.2.log
# When I finished this step, producing a vcf and tbi index file at './input/reference/', produced by bed file and vcf file
perl /home/xudong/download/tasseladmin-tassel-5-standalone-846381e171c8/run_pipeline.pl -Xmx20G -debug -configParameters config1.5.txt -LoadHaplotypesFromGVCFPlugin -endPlugin > 1.5.log
# at myconfig.txt, delete samDir and modify minTaxa =1, HaplotypeGraphBuilderPlugin configure are extracted from initial config.txt.
perl /home/xudong/download/tasseladmin-tassel-5-standalone-846381e171c8/run_pipeline.pl -Xmx20G -debug -configParameters myconfig.txt -ImputePipelinePlugin -imputeTarget pangenome -skipLiquibaseCheck true -endPlugin > pangenome.log
minimap2 -d ./outputDir/pangenome/pangenome_assembly_by_anchorwave.mmi ./outputDir/pangenome/pangenome_assembly_by_anchorwave.fa
# https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/ImputeWithPHG_fastq-homozygous the help document about config is lost.
perl /home/xudong/download/tasseladmin-tassel-5-standalone-846381e171c8/run_pipeline.pl -debug \
-configParameters ./myconfig.txt \
-HaplotypeGraphBuilderPlugin \
-configFile ./myconfig.txt \
-methods assembly_by_anchorwave \
-includeVariantContexts true \
-includeSequences false \
-endPlugin \
-FastqToMappingPlugin \
-minimap2IndexFile ./outputDir/pangenome/pangenome_assembly_by_anchorwave.mmi \
-keyFile ./readMapping_key_file.txt \
-fastqDir /media/xudong/14t1/phg/GBS \
-methodName assembly_by_anchorwave \
-methodDescription anchorwave \
-debugDir ./ \
-endPlugin > 1.71.log
perl /home/xudong/download/tasseladmin-tassel-5-standalone-846381e171c8/run_pipeline.pl -debug \
-configParameters ./myconfig.txt \
-HaplotypeGraphBuilderPlugin \
-configFile ./myconfig.txt \
-methods assembly_by_anchorwave \
-includeVariantContexts true \
-includeSequences false \
-endPlugin \
-BestHaplotypePathPlugin \
-keyFile ./readMapping_key_file_pathKeyFile.txt \
-outDir ./outputDir \
-minReads 0 \
-readMethod assembly_by_anchorwave \
-pathMethod assembly_by_anchorwave_PATH \
-endPlugin > 1.72.log
perl /home/xudong/download/tasseladmin-tassel-5-standalone-846381e171c8/run_pipeline.pl -debug \
-configParameters ./myconfig.txt \
-HaplotypeGraphBuilderPlugin \
-configFile ./myconfig.txt \
-methods assembly_by_anchorwave \
-includeVariantContexts true \
-includeSequences false \
-endPlugin \
-ImportDiploidPathPlugin -pathMethodName assembly_by_anchorwave_PATH -endPlugin \
-PathsToVCFPlugin \
-outputFile ./final.v2.vcf.gz \
-referenceFasta ./genome_data/Tt_1A_part1.fasta \
-endPlugin > 1.73.log
the last log section.
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -HaplotypeGraphBuilderPlugin, -configFile, ./myconfig.txt, -methods, assembly_by_anchorwave, -includeVariantContexts, true, -includeSequences, false, -endPlugin, -ImportDiploidPathPlugin, -pathMethodName, assembly_by_anchorwave_PATH, -endPlugin, -PathsToVCFPlugin, -outputFile, ./final.v2.vcf.gz, -referenceFasta, ./genome_data/Tt_1A_part1.fasta, -endPlugin, -runfork1]
net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin
net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin
net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jan 7, 2024 20:13:30
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
HaplotypeGraphBuilderPlugin Parameters
configFile: ./myconfig.txt
methods: assembly_by_anchorwave
includeSequences: false
includeVariantContexts: true
haplotypeIds: null
chromosomes: [1A]
taxa: null
localGVCFFolder: ./genome_data
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = ./phg_db_name.db host: localHost user: sqlite type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:./phg_db_name.db
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_ranges INNER JOIN ref_range_ref_range_method on ref_range_ref_range_method.ref_range_id=reference_ranges.ref_range_id INNER JOIN methods on ref_range_ref_range_method.method_id = methods.method_id AND methods.method_type = 7 ORDER BY reference_ranges.ref_range_id
methods size: 1
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 2515
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.028792687 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON gamete_haplotypes.gameteid = gametes.gameteid INNER JOIN genotypes on gametes.genoid = genotypes.genoid ORDER BY gamete_haplotypes.gamete_grp_id;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 2
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.008242209 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: assembly_by_anchorwave range group method: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, asm_end_coordinate, asm_strand, genome_file_id, seq_hash, seq_len...
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - CreateGraphUtils:addNodes - query=SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, asm_end_coordinate, asm_strand, genome_file_id, seq_hash, seq_len, gvcf_file_id FROM haplotypes inner join reference_ranges on haplotypes.ref_range_id = reference_ranges.ref_range_id WHERE method_id = 5 AND chrom in ('1A');
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 4990
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 2495
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 0.078513454 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested number of nodes: 4990 number of reference ranges: 2495
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jan 7, 2024 20:13:31
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Jan 7, 2024 20:13:31
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
ImportDiploidPathPlugin Parameters
pathMethodName: assembly_by_anchorwave_PATH
taxa: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = ./phg_db_name.db host: localHost user: sqlite type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:./phg_db_name.db
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin - importPathsFromDB: query: SELECT line_name, paths_data FROM paths, genotypes, methods WHERE paths.genoid=genotypes.genoid AND methods.method_id=paths.method_id AND methods.name IN ('assembly_by_anchorwave_PATH')
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin - importPathsFromDB: number of path list: 7
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Jan 7, 2024 20:13:31
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:31
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
PathsToVCFPlugin Parameters
outputFile: ./final.v2.vcf.gz.vcf
refRangeFileVCF: null
referenceFasta: ./genome_data/Tt_1A_part1.fasta
makeDiploid: true
positions: null
symbolicToN: false
symbolic: false
Genome FASTA character conversion: ACGTNacgtn to ACGTNacgtn
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of ranges: 2495
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of taxa: 7
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:32: progress: 0%
[DefaultDispatcher-worker-20] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:34: progress: 10%
[DefaultDispatcher-worker-21] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:35: progress: 20%
[DefaultDispatcher-worker-14] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:37: progress: 30%
[DefaultDispatcher-worker-24] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:40: progress: 40%
[DefaultDispatcher-worker-16] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:42: progress: 50%
[DefaultDispatcher-worker-24] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:45: progress: 60%
[DefaultDispatcher-worker-20] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:48: progress: 70%
[DefaultDispatcher-worker-19] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:13:51: progress: 80%
[DefaultDispatcher-worker-11] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:14:1: progress: 90%
[DefaultDispatcher-worker-12] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:14:4: progress: 100%
[DefaultDispatcher-worker-12] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Jan 7, 2024 20:14:4
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Jan 7, 2024 20:14:4: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jan 7, 2024 20:14:4: progress: 100%
I use fastq (only one chr) produced by bam that bwa mapping.the result don't have change. it confused me and it was screenshot vcf result, all individuals including reference have same result.
Plase use
ADD COMMENT/REPLY
to keep answers and comments organized.