I am working on gwas data and ran bwa, GATK pipeline and generated vcf file of all the samples. After merging the vcf file using picard
java -jar /680_info4/project/arora/program/picard.jar MergeVcfs I=/680_info4/project/arora1/raw_variants.vcf I=/680_info4/project/arora2/raw_variants.vcf I=/680_info4/project/arora3/raw_variants.vcf I=/680_info4/project/arora4/raw_variants.vcf I=/680_info4/project/arora5/raw_variants.vcf I=/680_info4/project/arora6/raw_variants.vcf I=/680_info4/project/arora7/raw_variants.vcf O=all_raw_variant.vcf
When I ran pca plot script in plink it ran with an error saying only one sample and when I saw the same in vcf file and we have only one sample in merge file:
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 20
>1 112 . C T 61.60 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=2.243;DP=117;ExcessHet=3.0103;FS=4.760;MLEAC=1;MLEAF=0.500;MQ=47.51;MQRankSum=-0.497;QD=0.5 3;ReadPosRankSum=-0.030;SOR=1.493 GT:AD:DP:GQ:PL 0/1:106,10:116:69:69,0,3944
>1 131 . G T 1457.60 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-0.046;DP=89;ExcessHet=3.0103;FS=2.787;MLEAC=1;MLEAF=0.500;MQ=49.38;MQRankSum=-0.647;QD=16. 38;ReadPosRankSum=-0.950;SOR=0.799 GT:AD:DP:GQ:PL 0/1:47,42:89:99:1465,0,1643
May I know what I an doing wrong? Any other way to merge vcf file or any suggestion are most welcome.
I tried bcftools merge option and got results like:
Not able to understand why giving missing information ./.:.:.:.:. in the information
The missing info means that a given variant was not in that particular VCF. In your example, 4/7 of your VCFs do not have variants called at that position, hence the missing genotype data.
If you are expecting to have reference calls included as well, which aren't in VCFs by default, look into calling gVCFs or using options like --emit-all-sites in the GATK.