Hi, I am trying to use an existing PHG database to impute variants.
input: 2 fastq files each from a different sample
I have 3 questions:
1) In the STEP 3, the manual provide examples of executing workflows, What steps should I use to get to a gvcf or vcf for each of the low coverage samples I have?
2) I already ran some steps and got a VCF file with the name coming from the outVcfFile variable in the config file. but I see a single column even that the input key file has independent 2 samples in 2 fastq files. How can I get a vcf file for each sample or have a genotype column for each sample?
3) Is it required that I use step 3B or | and 3C between step 3A and 3E?
I am following the information described here https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/Home.md it suggest to run the steps in this order:
STEP0 https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step0_main.md
STEP2.5 https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/UpdatePHGSchema.md
STEP 3 https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/ImputeWithPHG_main.md
What I have used so far to have a non error run is:
#STEP 1A makeDefaultDirectory
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -MakeDefaultDirectoryPlugin -workingDir /phg/ -endPlugin > 1_A.log
#STEP 0.A required but not in the manual
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -configParameters /phg/config0_A.txt -CheckDBVersionPlugin -outputDir /phg/ -endPlugin > 0_0.log
#STEP 2.5 Update PHG database schema
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -configParameters /phg/config2_5.txt -LiquibaseUpdatePlugin -outputDir /phg/outputDir -endPlugin > 2_5.log
STEP3A Create a pangenome Fasta File then stop
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -Xmx80G -debug -configParameters /phg/config_3.txt -ImputePipelinePlugin -imputeTarget pangenome -endPlugin > 3_A.log
STEP 3E Export imputed VCF from fastq files - homozygous
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -Xmx80G -debug -configParameters /phg/config_3.txt -ImputePipelinePlugin -imputeTarget pathToVCF -endPlugin > 3_E.log
Thanks Miguel
I did notice the section "Writing a config file".
here is the config file I was using for the step 3 mentioned in my original question:
Am I missing something in the config file or in the logic of what I am expecting as output?. Since is an imputation on independent samples shouldn't I get a list of imputed snps for each individual instead a single list of snps?
I ran the step 3E but The output VCF does not have any called genotypes for the sample as the relevant sample columns have a "." all the rows in the file follow this pattern. Do I have something missing in the config file or in the execution of step 3E?