Question

Help wanted for a struggling bioinfmatician! GATK Variantrecalibration

0

Entering edit mode

22 months ago

chrisgr ▴ 20

Hello, any answer on any the following questions will be much appreciated!

I'm playing around with gatk's VariantRecalibration tool. I have a few questions that I can't find information on that I seem to understand.

1) My tranches plot has no False Positives (see provided image) did something go wrong or did I overtrain or something? or can I assume that the fastq files I received in the beginning were of high quality?

2) Should I rearrange my setting so that I get FP in there or just keep it with the recommende know, training and test resources? Im currently using (which I chose based on another post I saw and somewhere I read that I should set a high confidence dataset to truth=true) --resource:hapmap,known=false,training=true,truth=false,prior=10 resources_broad_hg38_v0_hapmap_3.3.hg38.vcf.gz \ --resource:omni,known=true,training=false,truth=false,prior=12 resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf.gz \ --resource:1000G,known=false,training=true,truth=true,prior=12 resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz \ --resource:dbsnp,known=true,training=false,truth=false,prior=2 resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf \

3) What is this number of novel variations in the tranches plot exactly? I thought it was my total amount of variants but raw VCF consist of 500k variants. chatGPT says it's everything not present in the known/training/truth datasets, but is this number picked randomly? I don't understand whats happening there

4) Why is there no tranches plot made for INDELS?

FP enter image description here

VSQR applyVSQR gatk variantrecalibration • 407 views

ADD COMMENT • link 22 months ago by chrisgr ▴ 20