Hi All, I have used GATK UnifiedGenotyper to generate a raw.vcf file. Now I want to use GATK VQSR to get a more accurate result ,and I follow this protocol:
- snp.model <- BuildErrorModelWithVQSR(raw.vcf, SNP);
- indel.model <- BuildErrorModelWithVQSR(raw.vcf, INDEL);
- recalibratedSNPs.rawIndels.vcf <- ApplyRecalibration(raw.vcf, snp.model, SNP);
- analysisReady.vcf <- ApplyRecalibration(recalibratedSNPs.rawIndels.vcf, indel.model, INDEL);
I wanna know will it be better if I seperate the SNP and INDEL when I perform VQSR, like this:
- SNP.raw.vcf , INDEL.raw.vcf <- Seperate_SNP_INDEL(raw.vcf);
- snp.model <- BuildErrorModelWithVQSR(SNP.raw.vcf, SNP);
- indel.model <- BuildErrorModelWithVQSR( INDEL.raw.vcf, INDEL);
- SNP_analysisReady.vcf <- ApplyRecalibration(SNP.raw.vcf, snp.model, SNP);
- INDEL_analysisReady.vcf <- ApplyRecalibration(INDEL.raw.vcf, INDEL.model, SNP);
Thanks a lot !
Removed my answer, since Brad's is much more obviously correct. It's interesting how separating SNPs and INDELs are not even mentioned in the documentation I've been following: http://gatkforums.broadinstitute.org/discussion/39/variant-quality-score-recalibration
Looks like they ought to be separated at this stage after all.
Alex, definitely agreed. The VQSR documentation page is SNP specific, but that's not explicitly stated.