Hi evyerone,
I do not manage to get the imputed vcf file. The file is created but empty (just filled with header). I run the ImputePipelinePlugin
with "diploidPathToVCF" method. Because it seem that paths were created but output was empty. I then tried with PathsToVCFPlugin
as below
tassel-5-standalone/run_pipeline.pl -Xmx4G
-debug \
-configParameters params_config.txt \
-HaplotypeGraphBuilderPlugin -configFile params_config.txt \
-methods CONSENSUS_maxDiv0.0005 \
-includeVariantContexts true \
-includeSequences false \
-endPlugin \
-ImportDiploidPathPlugin \
-pathMethodName PATH_METHOD_maxDiv0.0005 \
-endPlugin \
-PathsToVCFPlugin \
-outputFile test.vcf \
-referenceFasta ref/my_reference.fasta \
-endPlugin
And the output (last lines) was:
[...]
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Oct 11, 2022 14:55:47
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
ImportDiploidPathPlugin Parameters
pathMethodName: PATH_METHOD_maxDiv0.0005
taxa: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = TestDB.db host: localHost user: sqlite type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:TestDB.db
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin - importPathsFromDB: query: SELECT line_name, paths_data FROM paths, genotypes, methods WHERE paths.genoid=genotypes.genoid AND methods.method_id=paths.method_id AND methods.name='PATH_METHOD_maxDiv0.0005'
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin - importPathsFromDB: number of path list: 3
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Oct 11, 2022 14:55:47
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:55:47
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
PathsToVCFPlugin Parameters
outputFile: test.vcf
refRangeFileVCF: null
referenceFasta: ref/my_reference.fasta
makeDiploid: true
positions: null
Genome FASTA character conversion: ACGTNacgtn to ACGTNacgtn
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of ranges: 43665
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of taxa: 3
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:55:56: progress: 0%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:56:51: progress: 10%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:57:42: progress: 20%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:58:32: progress: 30%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:59:23: progress: 40%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:00:14: progress: 50%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:01:4: progress: 60%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:01:54: progress: 70%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:02:45: progress: 80%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:03:35: progress: 90%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:04:26: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:04:26
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Oct 11, 2022 15:04:26: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Oct 11, 2022 15:04:26: progress: 100%
The progress in PathsToVCFPlugin
seems that is gathering and processing paths info, but the output file is empty, just has the header info:
##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=3,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##INFO=<ID=AF,Number=3,Type=Integer,Description="Allele Frequency">
##INFO=<ID=ASM_Chr,Number=1,Type=String,Description="Assembly chromosome">
##INFO=<ID=ASM_End,Number=1,Type=Integer,Description="Assembly end position">
##INFO=<ID=ASM_Start,Number=1,Type=Integer,Description="Assembly start position">
##INFO=<ID=ASM_Strand,Number=1,Type=String,Description="Assembly strand">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3
Does someone have an idea of why the vcf file would be empty? I have already tried with three different datasets and with none I manage to get the filled vcf file.
I used rPHG to check for the paths and I think that these are OK. So the problem would be outputting them to VCF?
> pathMet <- rPHG::pathsForMethod(
+ configFile = configPath,
+ pathMethod = "PATH_METHOD_maxDiv0.0005"
+ )
>
> dim(pathMet)
[1] 6 37754
>
> pathMet[, 1:10]
5242 5243 5244 5246 5249 5250 5251 5252 5253 5255
Sample1 192612 188553 191169 188039 189055 188490 189111 191889 189220 189425
Sample1 192613 188553 191169 188039 189055 188490 189112 191890 189221 189426
Sample2 192612 188553 191169 188039 -1 188490 189111 191889 189220 189425
Sample2 192612 188553 191169 188039 -1 188490 189111 191889 189221 189426
Sample3 192612 -1 191169 188039 -1 -1 189111 -1 189220 189425
Sample3 192613 -1 191169 188038 -1 -1 189111 -1 189220 189425
Thank you