Question

Impute variants with the PHG. Pipeline and VCF final output for each sample

1

Entering edit mode

3.6 years ago

Miguel ▴ 10

Hi, I am trying to use an existing PHG database to impute variants.

input: 2 fastq files each from a different sample

I have 3 questions:

1) In the STEP 3, the manual provide examples of executing workflows, What steps should I use to get to a gvcf or vcf for each of the low coverage samples I have?

2) I already ran some steps and got a VCF file with the name coming from the outVcfFile variable in the config file. but I see a single column even that the input key file has independent 2 samples in 2 fastq files. How can I get a vcf file for each sample or have a genotype column for each sample?

3) Is it required that I use step 3B or | and 3C between step 3A and 3E?

I am following the information described here https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/Home.md it suggest to run the steps in this order:

STEP0 https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step0_main.md
STEP2.5 https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/UpdatePHGSchema.md
STEP 3 https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/ImputeWithPHG_main.md

What I have used so far to have a non error run is:

#STEP 1A  makeDefaultDirectory

singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -MakeDefaultDirectoryPlugin -workingDir /phg/ -endPlugin > 1_A.log

#STEP 0.A  required but  not in the manual
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -configParameters /phg/config0_A.txt -CheckDBVersionPlugin -outputDir /phg/ -endPlugin > 0_0.log

#STEP 2.5 Update PHG database schema
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -configParameters /phg/config2_5.txt -LiquibaseUpdatePlugin -outputDir /phg/outputDir -endPlugin > 2_5.log

STEP3A  Create a pangenome Fasta File then stop
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -Xmx80G -debug -configParameters /phg/config_3.txt -ImputePipelinePlugin -imputeTarget pangenome -endPlugin > 3_A.log

STEP 3E Export imputed VCF from fastq files - homozygous
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -Xmx80G -debug -configParameters /phg/config_3.txt -ImputePipelinePlugin -imputeTarget pathToVCF -endPlugin > 3_E.log

Thanks Miguel

PHG • 1.2k views

ADD COMMENT • link updated 3.6 years ago by Ram 44k • written 3.6 years ago by Miguel ▴ 10

Ram · Answer 1 · 2021-04-23

0

Entering edit mode

3.6 years ago

pjb39 ▴ 220

You only need to run 3E to generate a VCF file. That will also run all necessary intermediate steps including 3A as long as you have a configuration file with all of the required parameters filled out. If you have already run some of the other steps, those will be skipped and not run again. In case you missed it, there is a link to a sample config file in the section "Writing a config file" near the top of the web page.

ADD COMMENT • link 3.6 years ago by pjb39 ▴ 220

0

Entering edit mode

I did notice the section "Writing a config file".

here is the config file I was using for the step 3 mentioned in my original question:

host=localHost
user=sqlite
password=sqlite
DB=/phg/phg_v5Assemblies_20200608.db
DBtype=sqlite
liquibaseOutdir=/phg/outputDir
pangenomeHaplotypeMethod=mummer4
pangenomeDir=/phg/outputDir/pangenome
indexKmerLength=21
indexWindowSize=11
indexNumberBases=90G
inputType=fastq
readMethod=Test1
keyFile=/phg/key.inputfromfq.txt
fastqDir=/phg/inputDir/imputation/fastq/
samDir=/phg/inputDir/imputation/sam/
lowMemMode=true
maxRefRangeErr=0.25
outputSecondaryStats=false
maxSecondary=50
fParameter=f1000,5000
minimapLocation=minimap2
pathHaplotypeMethod=mummer4
pathMethod=TEST1
maxNodes=1000
maxReads=10000
minReads=1
minTaxa=1
minTransitionProb=0.001
numThreads=10
probCorrect=0.99
removeEqual=true
splitNodes=true
splitProb=0.99
usebf=false
usebf=false
minP=0.8
maxHap=11
maxReadsKB=100
algorithmType=efficient
outVcfFile=Test1_out

Am I missing something in the config file or in the logic of what I am expecting as output?. Since is an imputation on independent samples shouldn't I get a list of imputed snps for each individual instead a single list of snps?

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 3.6 years ago by Miguel ▴ 10

0

Entering edit mode

I ran the step 3E but The output VCF does not have any called genotypes for the sample as the relevant sample columns have a "." all the rows in the file follow this pattern. Do I have something missing in the config file or in the execution of step 3E?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       15      .       A       G       .       .       .
1       16      .       T       A,C,G   .       .       .
1       17      .       A       C,G     .       .       .
1       18      .       A       C       .       .       .
1       24      .       G       T       .       .       .

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 3.6 years ago by Miguel ▴ 10