I’m converting gVCFs to VCF, but the reference alleles are missing. An example below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 180525_FD02929177
1 97547947 . T . . . DP=31 GT:DP:RGQ 0/0:31:81
1 97915614 . C . . . DP=40 GT:DP:RGQ 0/0:40:99
1 97981343 . A . . . DP=43 GT:DP:RGQ 0/0:43:99
2 234668570 . C T 539.64 . AC=1;AF=0.500;AN=2;ClippingRankSum=0.340;
DP=32;ExcessHet=3.0103;FS=5.748;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=16.86;RAW_MQ=115200.00;SOR=0.150 G
T:AD:DP:GQ:PL 0/1:17,15:32:99:547,0,586
2 234669144 . G . . . DP=36 GT:DP:RGQ 0/0:36:99
which was made by break_blocks:
break_blocks --region-file /illumina/runs/con/concordance/fluidigm/fluidigm_positions.tab.bed --ref human_g1k_v37.fasta --exclude-off-target
I’m using GATK thus:
gatk --java-options "-Xmx4g" GenotypeGVCFs \
-R /illumina/runs/con/g1k_v37/human_g1k_v37.fasta \
-V fluidigm.gvcf.202009/HG00099.fluidigm.202009.g.vcf \
-O fluidigm.vcf.202009/HG00099.fluidigm.202009.vcf \
--allow-old-rms-mapping-quality-annotation-data \
--include-non-variant-sites
But none of the options in GATK seem to allow adding reference alleles to the REF
column, everything is just .
. When I try this manually with a Perl script, there are missing data, so programming it myself can’t work.
Do you know how I can add the reference alleles to VCF/gVCF?
Please do not delete a question after it has been addressed in some way. Eyeballing columns wrong is a common problem and someone else could benefit from your experience.
Please accept my answer below using the green check mark on the left.
Heys! I'm having the exact same problem! Did you solve it? I would really appreciate it!