Hi, I have a GATK vcf file and want to get the consensus sequence. I know samtools-bcftools-vcfutils pipeline to get the consensus sequence, but I try use GATK and vcfutils to get the result. However, when I input GATK vcf file to vcfutils, the program report
Use of uninitialized value in addition (+) at /my/bin/vcfutils.pl line 518, <> line 94433.
Use of uninitialized value in numeric lt (<) at my/bin/vcfutils.pl line 508, <> line 94434.
and didn't get the consensus sequence.( file context are N)
The code GATK get vcf file
java -Xmx4g -jar /my/GenomeAnalysisTK-1.5-30/GenomeAnalysisTK.jar -R /my/Drosophila3R.fa -T UnifiedGenotyper -I IN.3R.sam.GATK1.bam -o snps.raw.EMIT_ALL_CONFIDENT_SITES.vcf -out_mode EMIT_ALL_CONFIDENT_SITES
The code samtools-bcftools get vcf file
samtools mpileup -uD -f Drosophila3R.fa sorted.3R.bam | bcftools view -cg - > test.vcf
snps.raw.EMITALLCONFIDENT_SITES.vcf
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT whatever
3R 4 . T . 33.01 . AC=0;AF=0.00;AN=2;DP=1;MQ=37.00;MQ0=0 GT:DP 0/0:1
3R 5 . T . 39.01 . AC=0;AF=0.00;AN=2;DP=3;MQ=37.00;MQ0=0 GT:DP 0/0:3
3R 6 . C . 42.03 . AC=0;AF=0.00;AN=2;DP=4;MQ=37.00;MQ0=0 GT:DP 0/0:4
test.vcf
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sorted.3R.bam
3R 4 . T . 33 . DP=1;AF1=0;AC1=0;DP4=0,1,0,0;MQ=37;FQ=-30 PL:DP 0:1
3R 5 . T . 39 . DP=3;AF1=0;AC1=0;DP4=0,3,0,0;MQ=37;FQ=-36 PL:DP 0:3
3R 6 . C . 42 . DP=4;AF1=0;AC1=0;DP4=1,3,0,0;MQ=37;FQ=-39 PL:DP 0:4
I don't know why the vcfuilts.pl can't get the consensus sequence from GATK vcf file. Can anybody help me to fix the error?
thanks a lot :)
by the way,is there a way of calculating FQ INFO from the data GATK gives?
Eureka!
I find the FQ==QUAL-3
so I can calcualting it use QUAL and use GT to distinguish heterozygotes and homozygotes
Hi
Actually I am also trying the similar thing, can you tell from where you got the information "FQ==QUAL-3". Is it in some manual of samtools.
Thanks