Hi @all! It is a question about GATK VariantRecalibrator.
The data I use containing 50 simples at 15X average exome sequencing. Everything seems well at the beginning. But Errors come out in the end:
<h5>ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --percentBadVariants 0.05, for example) or lowering the maximum number of Gaussians to use in the model (via --maxGaussians 4, for example)</h5>The Command I used: java -Xmx1555m -jar /home/chris/install/GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar -R /home/chris/data/hg/ucsc.hg19.fasta -T VariantRecalibrator -input /home/chris/data/train/SRR_50bam.raw.l.new.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /home/chris/data/train/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,true=false,prior=12.0 /home/chris/data/train/1000G_omni2.5.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 /home/chris/data/hg/dbsnp_135.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff -an FS -an DP -an MQ -an InbreedingCoeff -recalFile /home/chris/SRR_50bam.recal -tranchesFile /home/chris/SRR_50bam.tranches -rscriptFile /home/chris/plots.R -nt 2 -mG 4 -percentBad 0.05 -L /home/chris/data/train/exome.bed
Notice that I have already used -mG 4 -percentBad 0.05 parameters. a INFO said: INFO 11:49:45,070 VariantDataManager - Additionally training with worst 5.000% of passing data --> 3942 variants with LOD <= 0.0000.
Somewhere said that it seems like it was the negative model using the worst X percent of variants is too low. But when I change the -percentBad to 0.15, error still appears. The INFO: INFO 13:10:13,059 VariantDataManager - Additionally training with worst 15.000% of passing data --> 11825 variants with LOD <= 0.0000. The LOD is still 0. I don't know why process can't complete. And what's LOD? My raw VCF called almost 80000 SNPs. It's really the 3942 or 11825 variants not enough?
Here is a simple sample in my raw VCF: chr1 881627 rs2272757 G A 1249.50 . AC=52;AF=0.650;AN=80;BaseQRankSum=4.484;DB;DP=280;Dels=0.00;FS=0.000;HRun=1;HaplotypeScore=0.4991;InbreedingCoeff=0.1468;MQ=34.16;MQ0=5;MQRankSum=-1.772;QD=5.98;ReadPosRankSum=0.671;SB=-629.98 GT:AD:DP:GQ:PL 0/1:3,5:8:64.89:68,0,65 ./. ........... 0/0:2,0:2:3:0,3,25 chr1 881784 . C T 124.05 . AC=2;AF=0.021;AN=96;BaseQRankSum=0.481;DP=430;Dels=0.00;FS=17.640;HRun=1;HaplotypeScore=0.8485;InbreedingCoeff=-0.0583;MQ=39.06;MQ0=3;MQRankSum=-0.937;QD=4.00;ReadPosRankSum=-0.505;SB=-2.11 GT:AD:DP:GQ:PL 0/0:8,0:8:21.03:0,21,201 ./. ..............0/0:4,0:4:6.01:0,6,61
I am so sad! Look forward to your reply!