Combining VCF files using GATK
0
2
Entering edit mode
4.7 years ago
tothepoint ▴ 940

I am trying to combine vcf file generated after haplotypecaller from different variety of breed. The total number of samples is ~100. I ran the command

gatk CombineGVCFs -R genome.fna --variant 1.vcf.gz --variant 2.vcf.gz.....variant97.vcf.gz -O combine.g.vcf.gz

but when I am trying to check the file it contain only 41 samples instead of 97 in vcf. If anyone know what I am doing wrong or any experience please share how to fix such situation.

WGS VCF GWAS • 4.0k views
ADD COMMENT
1
Entering edit mode

I would check which samples are missing, does this give you a clue? Perhaps the index is missing from the missing vcf files. Check which samples have been included:

zgrep -m 1 '^#CHROM' combine.g.vcf.gz | head -n 1 | cut -f 10- | tr '\t' '\n'
ADD REPLY
0
Entering edit mode

Which version of GATK are you using?

ADD REPLY
0
Entering edit mode

I am using gatk4.1.4.0

ADD REPLY
1
Entering edit mode

Have you checked the log file to make sure the command completed. Also I'll recommend GenomicsDBImport instead of CombineGVCFs.

ADD REPLY
0
Entering edit mode

I checked the log file and there was no such issue. I already gave one more shot to CombineGVCFs but more curious to check using GenomicsDBImport. Thanks

ADD REPLY
0
Entering edit mode

Were you able to figure out what went wrong? I have a similar issue with CombineGVCFs. All my samples are literally merged into ONE. I have 243 individuals, but the final VCF shows all the variants as in one individual. I have no idea what is going on. I checked the log file. It looks fine. Reading in all the individual samples with no errors. Here is the command i used:

gatk --java-options "-Xmx30G" CombineGVCFs -R reference --variant file1.g.vcf  --variant file2.g.vcf ... -O cohort.g.vcf

Any tips on how to fix this? I can't use the final vcf when there are no information on individuals. Thanks in advance

ADD REPLY
0
Entering edit mode

Only add an answer if you're answering the top level question. If you have a follow up or "I have this problem too" statement, use Comments instead. I've moved your post to a comment this time.

ADD REPLY
0
Entering edit mode

There was indexing issue from the missing combining files. I cross checked and indexed those file with

bgzip -c missing_vcf.vcf > missing_vcf.vcf.gz
tabix -p vcf missing_vcf.vcf.gz

And re-run them with all file combined successfully.

ADD REPLY

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6