Hi,
I want to calculate Fst by vcf tools and GATK. I did this steps:
for creating gvcf: gatk HaplotypeCaller -R ref.scf.fasta -I input.bam -ERC GVCF -O out.g.vcf
for combine: gatk CombineGVCFs -R ref.scf.fasta --variant 1.g.vcf --variant 2.g.vcf --variant 3.g.vcf --variant 4.g.vcf --variant 5.g.vcf --variant 6.g.vcf --variant 7.g.vcf --variant 8.g.vcf -O outt.vcf
for genotype: gatk GenotypeGVCFs -R ref.scf.fasta --variant outt -O outt.vcflist
for Fst: vcftools --vcf outt.vcflist --weir-fst-pop 1.txt --weir-fst-pop 2.txt --out 1_vs_2
There was no error till step 3 and successfully done. But, I got error in step 4: Error: Require Genotypes in VCF file in order to output Fst statistics.
I used the last version (GATK/4.1.3.0-Java-1.8) and the last version of vcf tools (VCFtools/0.1.16), but I still have the same error!
Would you please advise me?
Best, Razi
what is the output of:
?
does vcftools use the file extension ? try to change the name
I run it again this time with this output name: out2.vcf and I run step 4 again and i still have the same error!
only one individual in the output of CombineGVCFs ??
no, there is 8 ind.
do they all have a different sample name ?
this is exactly my script:
that is not my question: do they all have a different name in each '#CHROM' line of each g.vcf.file ?
sorry, maybe I don't understand your mean. I have some information in this page (Razi): https://gatkforums.broadinstitute.org/gatk/discussion/10005/require-genotypes-in-vcf-file-in-order-to-output-fst-statistics?
I copy "CHROM" line of some gvcf files:
this is where you're wrong: all your vcf have one and only one sample named ind1 . There is something wrong in the way your bams where generated. See : https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups
Oh, yes. You're right. I put ind1 (RGSM=ind1) for all samples when I was running AddOrReplaceReadGroups!!!! I'll run it again. I hope it solves. Really thanks for your advice:)