vcf tools Error: Require Genotypes in VCF file in order to output Fst statistics
0
0
Entering edit mode
5.1 years ago
Razi • 0

Hi,

I want to calculate Fst by vcf tools and GATK. I did this steps:

  1. for creating gvcf: gatk HaplotypeCaller -R ref.scf.fasta -I input.bam -ERC GVCF -O out.g.vcf

  2. for combine: gatk CombineGVCFs -R ref.scf.fasta --variant 1.g.vcf --variant 2.g.vcf --variant 3.g.vcf --variant 4.g.vcf --variant 5.g.vcf --variant 6.g.vcf --variant 7.g.vcf --variant 8.g.vcf -O outt.vcf

  3. for genotype: gatk GenotypeGVCFs -R ref.scf.fasta --variant outt -O outt.vcflist

  4. for Fst: vcftools --vcf outt.vcflist --weir-fst-pop 1.txt --weir-fst-pop 2.txt --out 1_vs_2

There was no error till step 3 and successfully done. But, I got error in step 4: Error: Require Genotypes in VCF file in order to output Fst statistics.

I used the last version (GATK/4.1.3.0-Java-1.8) and the last version of vcf tools (VCFtools/0.1.16), but I still have the same error!

Would you please advise me?

Best, Razi

software error genome next-gen • 4.6k views
ADD COMMENT
0
Entering edit mode

what is the output of:

$ file outt.vcflist
$ grep -m 1 "#CHROM" outt.vcflist

?

does vcftools use the file extension ? try to change the name

mv outt.vcflist outt.vcf`
ADD REPLY
0
Entering edit mode

I run it again this time with this output name: out2.vcf and I run step 4 again and i still have the same error!

$file out2.vcf

out2.vcf: ASCII English text, with very long lines


$grep -m 1 "#CHROM" out2.vcf

CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ind1
ADD REPLY
0
Entering edit mode

only one individual in the output of CombineGVCFs ??

ADD REPLY
0
Entering edit mode
gatk CombineGVCFs -R ref.scf.fasta --variant 1.g.vcf --variant 2.g.vcf --variant 3.g.vcf --variant 4.g.vcf --variant 5.g.vcf --variant 6.g.vcf --variant 7.g.vcf --variant 8.g.vcf -O outt.vcf

no, there is 8 ind.

ADD REPLY
0
Entering edit mode

do they all have a different sample name ?

ADD REPLY
0
Entering edit mode

this is exactly my script:

gatk CombineGVCFs -R sylvia.scf.fasta --variant P11456_101.g.vcf --variant P11456_102.g.vcf --variant P11456_105.g.vcf --variant P11456_106.g.vcf --variant P10261_104.g.vcf --variant P11456_103.g.vcf --variant P11456_104.g.vcf --variant P4253_106.g.vcf -O cur-alt.vcf
ADD REPLY
0
Entering edit mode

that is not my question: do they all have a different name in each '#CHROM' line of each g.vcf.file ?

ADD REPLY
0
Entering edit mode

sorry, maybe I don't understand your mean. I have some information in this page (Razi): https://gatkforums.broadinstitute.org/gatk/discussion/10005/require-genotypes-in-vcf-file-in-order-to-output-fst-statistics?

I copy "CHROM" line of some gvcf files:

**file1:**

##contig=<ID=scaffold_6451,length=882>
##source=HaplotypeCaller
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ind1
scaffold_0      1       .       A       <NON_REF>       .       .       END=12473       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
scaffold_0      12474   .       T       <NON_REF>       .       .       END=12482       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,16
scaffold_0      12483   .       C       <NON_REF>       .       .       END=12493       GT:DP:GQ:MIN_DP:PL      0/0:2:6:2:0,6,79
scaffold_0      12494   .       A       <NON_REF>       .       .       END=12495       GT:DP:GQ:MIN_DP:PL      0/0:3:9:3:0,9,124
scaffold_0      12496   .       T       <NON_REF>       .       .       END=12496       GT:DP:GQ:MIN_DP:PL      0/0:3:0:3:0,0,71


**file 2:**

##contig=<ID=scaffold_6451,length=882>
##source=HaplotypeCaller
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ind1
scaffold_0      1       .       A       <NON_REF>       .       .       END=12469       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
scaffold_0      12470   .       A       <NON_REF>       .       .       END=12489       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,16
scaffold_0      12490   .       T       <NON_REF>       .       .       END=12502       GT:DP:GQ:MIN_DP:PL      0/0:2:6:2:0,6,62
scaffold_0      12503   .       C       G,<NON_REF>     38.61   .       BaseQRankSum=0.967;DP=3;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRan$
scaffold_0      12504   .       A       <NON_REF>       .       .       END=12509       GT:DP:GQ:MIN_DP:PL      0/0:3:9:3:0,9,110
scaffold_0      12510   .       A       <NON_REF>       .       .       END=12520       GT:DP:GQ:MIN_DP:PL      0/0:4:12:4:0,12,126
scaffold_0      12521   .       C       <NON_REF>       .       .       END=12528       GT:DP:GQ:MIN_DP:PL      0/0:5:15:5:0,15,147


**file 3:**

##contig=<ID=scaffold_6451,length=882>
##source=HaplotypeCaller
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ind1
scaffold_0      1       .       A       <NON_REF>       .       .       END=12429       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
scaffold_0      12430   .       C       <NON_REF>       .       .       END=12442       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,16
scaffold_0      12443   .       A       <NON_REF>       .       .       END=12443       GT:DP:GQ:MIN_DP:PL      0/0:2:6:2:0,6,74
scaffold_0      12444   .       T       <NON_REF>       .       .       END=12448       GT:DP:GQ:MIN_DP:PL      0/0:3:9:3:0,9,75
scaffold_0      12449   .       A       <NON_REF>       .       .       END=12455       GT:DP:GQ:MIN_DP:PL      0/0:4:12:4:0,12,105
scaffold_0      12456   .       C       A,<NON_REF>     35.60   .       BaseQRankSum=-0.431;DP=4;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00
ADD REPLY
0
Entering edit mode

this is where you're wrong: all your vcf have one and only one sample named ind1 . There is something wrong in the way your bams where generated. See : https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

ADD REPLY
0
Entering edit mode

Oh, yes. You're right. I put ind1 (RGSM=ind1) for all samples when I was running AddOrReplaceReadGroups!!!! I'll run it again. I hope it solves. Really thanks for your advice:)

ADD REPLY

Login before adding your answer.

Traffic: 2649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6