Entering edit mode
4.7 years ago
tothepoint
▴
940
I am trying to combine vcf file generated after haplotypecaller from different variety of breed. The total number of samples is ~100. I ran the command
gatk CombineGVCFs -R genome.fna --variant 1.vcf.gz --variant 2.vcf.gz.....variant97.vcf.gz -O combine.g.vcf.gz
but when I am trying to check the file it contain only 41 samples instead of 97 in vcf. If anyone know what I am doing wrong or any experience please share how to fix such situation.
I would check which samples are missing, does this give you a clue? Perhaps the index is missing from the missing vcf files. Check which samples have been included:
Which version of GATK are you using?
I am using gatk4.1.4.0
Have you checked the log file to make sure the command completed. Also I'll recommend
GenomicsDBImport
instead of CombineGVCFs.I checked the log file and there was no such issue. I already gave one more shot to CombineGVCFs but more curious to check using GenomicsDBImport. Thanks
Were you able to figure out what went wrong? I have a similar issue with CombineGVCFs. All my samples are literally merged into ONE. I have 243 individuals, but the final VCF shows all the variants as in one individual. I have no idea what is going on. I checked the log file. It looks fine. Reading in all the individual samples with no errors. Here is the command i used:
Any tips on how to fix this? I can't use the final vcf when there are no information on individuals. Thanks in advance
Only add an answer if you're answering the top level question. If you have a follow up or "I have this problem too" statement, use Comments instead. I've moved your post to a comment this time.
There was indexing issue from the missing combining files. I cross checked and indexed those file with
And re-run them with all file combined successfully.