Help wanted for a struggling bioinfmatician! GATK Variantrecalibration
0
0
Entering edit mode
19 months ago
chrisgr ▴ 20

Hello, any answer on any the following questions will be much appreciated!

I'm playing around with gatk's VariantRecalibration tool. I have a few questions that I can't find information on that I seem to understand.

1) My tranches plot has no False Positives (see provided image) did something go wrong or did I overtrain or something? or can I assume that the fastq files I received in the beginning were of high quality?

2) Should I rearrange my setting so that I get FP in there or just keep it with the recommende know, training and test resources? Im currently using (which I chose based on another post I saw and somewhere I read that I should set a high confidence dataset to truth=true) --resource:hapmap,known=false,training=true,truth=false,prior=10 resources_broad_hg38_v0_hapmap_3.3.hg38.vcf.gz \ --resource:omni,known=true,training=false,truth=false,prior=12 resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf.gz \ --resource:1000G,known=false,training=true,truth=true,prior=12 resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz \ --resource:dbsnp,known=true,training=false,truth=false,prior=2 resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf \

3) What is this number of novel variations in the tranches plot exactly? I thought it was my total amount of variants but raw VCF consist of 500k variants. chatGPT says it's everything not present in the known/training/truth datasets, but is this number picked randomly? I don't understand whats happening there

4) Why is there no tranches plot made for INDELS?

FPenter image description here

VSQR applyVSQR gatk variantrecalibration • 370 views
ADD COMMENT

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6