Hello, any answer on any the following questions will be much appreciated!
I'm playing around with gatk's VariantRecalibration tool. I have a few questions that I can't find information on that I seem to understand.
1) My tranches plot has no False Positives (see provided image) did something go wrong or did I overtrain or something? or can I assume that the fastq files I received in the beginning were of high quality?
2) Should I rearrange my setting so that I get FP in there or just keep it with the recommende know, training and test resources? Im currently using (which I chose based on another post I saw and somewhere I read that I should set a high confidence dataset to truth=true) --resource:hapmap,known=false,training=true,truth=false,prior=10 resources_broad_hg38_v0_hapmap_3.3.hg38.vcf.gz \ --resource:omni,known=true,training=false,truth=false,prior=12 resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf.gz \ --resource:1000G,known=false,training=true,truth=true,prior=12 resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz \ --resource:dbsnp,known=true,training=false,truth=false,prior=2 resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf \
3) What is this number of novel variations in the tranches plot exactly? I thought it was my total amount of variants but raw VCF consist of 500k variants. chatGPT says it's everything not present in the known/training/truth datasets, but is this number picked randomly? I don't understand whats happening there
4) Why is there no tranches plot made for INDELS?
FP