Entering edit mode
3.0 years ago
michael.flower.14
▴
200
I'm trying to get a workflow up and running for sequencing a small 500 bp region on chromosome 5. I've got the initial steps of alignment and qc, but according to gatk best practices I need to combine my gvcf files before genotyping, normalising, indexing and recalibrating variants. the step I'm really stuck on is combining the gvcf files into one. The output I get contains only one column of genotypes, but I'm expecting a column for each sample.
Here's the combining gvcf step:
# List paths to GVCFs to combine
# https://www.biostars.org/p/9501554/#9501607
find "$DIRECTORY"/temp_gvcf_2 -type f -name "*.vcf.gz" > "$DIRECTORY"/temp_gvcf_2/input.list
# Combine GVCFs
gatk CombineGVCFs \
--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' \
-R "$GKREF"/Homo_sapiens_assembly38.fasta \
--variant "$DIRECTORY"/temp_gvcf_2/input.list \
-O "$DIRECTORY"/temp_gvcf_2/cohort.g.vcf
And here are the columns I get in the resulting cohort.g.vcf
file:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT M00819_262_000000000-BP6CG_1_
chr1 12046 . G <NON_REF> . . END=12410 GT:DP:GQ:MIN_DP:PL ./.:0:0:0:0,0,0
chr1 12496 . G <NON_REF> . . END=12860 GT:DP:GQ:MIN_DP:PL ./.:0:0:0:0,0,0
chr1 13316 . A <NON_REF> . . END=13767 GT:DP:GQ:MIN_DP:PL ./.:0:0:0:0,0,0
Anyone know where I'm going wrong??
what is the output of
wc -l "$DIRECTORY"/temp_gvcf_2/input.list
?(BTW sounds weird for me , I would write
wc -l "${DIRECTORY}/temp_gvcf_2/input.list"
The result of your command is:
And the contents of the file are:
and what is the output of
?
The result of this command is this:
I take it this could be the problem? Not sure what the issue is though?
your bcftools is not correctly installed. But it is another problem. You can try: