no output from GATK CombineGVCFs
1
1
Entering edit mode
2.1 years ago
liyong ▴ 80

Hello All,

I am using GATK to do SNPs calling from 140 RNAseq data. After variant calling of each sample with HaplotypeCaller, I get 140 g.vcf.gz files. Before perform the final joint genotyping through GenotypeGVCFs, I need to combine these 140 g.vcf.gz files into one. Beforehand, I prepare a gvcf file list with ls hard_filtered/*_filtered.vcf.gz > gvcfs.list

And the command used for combining are: gatk CombineGVCFs -R genome.fa --variant gvcfs.list -O cohort.vcf.gz

After running a very long time (~8 hours), the job finished (telling from top command), and the log file didn't show any error. However, I didn't see any output file. Is there anything I can do to fix this?

Thanks a lot.

combineGVCFs gatk • 1.5k views
ADD COMMENT
0
Entering edit mode

and the log file didn't show any error.

which log file ?

Retry with

gatk   --java-options "-Xmx5g -Djava.io.tmpdir=." CombineGVCFs  -R genome.fa --variant gvcfs.list -O cohort.g.vcf.gz 2> log.stderr
ADD REPLY
0
Entering edit mode

Thanks for your comments.

The CombineGVCFs commands are inside a shell script, when I run the script, I use nohup ./combine.sh >& combine.log &

ADD REPLY
1
Entering edit mode

don't use nohup. nohup is bad, among other things it mixes stdout and stderr, you don't know how and why it exited . You'd better use screen or tmux .

ADD REPLY
0
Entering edit mode

Thanks Pierre for the tips I will give it a try with screen later.

ADD REPLY
0
Entering edit mode
17 months ago
Vitis ★ 2.6k

The number of gVCFs in the list has a hard limit (interestingly I don't exactly know this number as it changes with different GATK versions). If the number is higher than the limit, GATK CombineGVCFs would run without an error message (surprising!) but there would be no output (your cohort.vcf.gz). I think GATK only keeps CombineGVCFs as a legacy function. We really should not use it but use the recommended genomicDB approach for combining gVCFs and joint variant calling. Or there are other options like GLNexus: https://github.com/dnanexus-rnd/GLnexus.

ADD COMMENT
0
Entering edit mode

The number of gVCFs in the list has a hard limit

no, I don't think so. But if you have 10000 VCFs, you should combine 100x100 VCFS and then combine the 100 new VCF.

ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6