Tranche Plot after VSQR
1
0
Entering edit mode
5.3 years ago
BAGeno ▴ 190

Hi

I am doing exome sequencing of 15 patients. After variant calling I run VariantRecalibrator using this command

gatk VariantRecalibrator -R new_hg38.fa \
  -V allsample.vcf \
 --resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.vcf.gz \
--resource omni,known=false,training=true,truth=false,prior=12.0:omni2.5.hg38.vcf.gz \
--resource 1000G,known=false,training=true,truth=false,prior=10.0:thouzndG_phase1.snps.high_confidence.hg38.vcf.gz  \
--resource dbsnp,known=true,training=false,truth=false,prior=2.0:All_20180418.chr.hg38.vcf.gz \
-an QD \
-an MQ \
-an MQRankSum \
-an ReadPosRankSum \
-an FS \
-an SOR \
-mode SNP \
-O output.snp.recal  \
--tranches-file output.snp.tranches \
--rscript-file output.snp..plots.R

Can you please tell me whether this tranche graph is accurate or not?

gatk4 VariantReclibration • 2.9k views
ADD COMMENT
0
Entering edit mode

Could you add GATK version you used. And also the previous steps you did. Did you used an interval file ? Tranche plot is not very good IMO.

ADD REPLY
0
Entering edit mode

I have used GATK 4.0.8. I run HaplotypeCaller, CombineGVCFs, and GenotypeGVCFs. I did not use interval file. So what should I do for this?

ADD REPLY
4
Entering edit mode
5.3 years ago

For exome sequencing datasets you should always use an interval file. As explained on GATK website (https://software.broadinstitute.org/gatk/documentation/article?id=11009-

For exomes and similarly targeted data types, the interval list should correspond to the capture targets used for the library prep, and is typically provided by the prep kit manufacturer (with versions for each ref genome build of course).

Thus you should at each step where an -L option is available use the corresponding interval file. The exome kit vendor should provide you with the corresponding interval file. If not you can use gencode :

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/gencode.v31.annotation.gtf.gz
zcat gencode.v31.annotation.gtf.gz | awk '$3 == "exon" { print $1, $4, $5, $7, $18}' | tr ' ' '\t' | tr -d '";' >> interval_list.bed

For information if you use GATK4 best practices these are the tools where the interval file needs to be specified

  • BaseRecalibrator
  • ApplyBQSR
  • HaplotypeCaller
  • GenomicsDBimport (as you have only 15 samples you can use directly GenotypeGVCFs)
  • GenotypeGVCFs
  • VariantRecalibrator
  • ApplyVQSR

Also as you have a limited number of samples you should add 1000G samples (from here : https://software.broadinstitute.org/gatk/documentation/article.php?id=1259)

In our testing we've found that in order to achieve the best exome results one needs to use an exome SNP and/or indel callset with at least 30 samples. For users with experiments containing fewer exome samples there are several options to explore:

Add additional samples for variant calling, either by sequencing additional samples or using publicly available exome bams from the 1000 Genomes Project (this option is used by the Broad exome production pipeline). Be aware that you cannot simply add VCFs from the 1000 Genomes Project. You must either call variants from the original BAMs jointly with your own samples, or (better) use the reference model workflow to generate GVCFs from the original BAMs, and perform joint genotyping on those GVCFs along with your own samples' GVCFs with GenotypeGVCFs.

You can also try using the VQSR with the smaller variant callset, but experiment with argument settings (try adding --maxGaussians 4 to your command line, for example). You should only do this if you are working with a non-model organism for which there are no available genomes or exomes that you can use to supplement your own cohort.

ADD COMMENT
0
Entering edit mode

Is it necessary to include 1000 genomes bam? As I have machine of low RAM

ADD REPLY
1
Entering edit mode

you can try by changing maxGaussians to 4 in VariantRecalibrator.

ADD REPLY
0
Entering edit mode

@Nicolas I have made GVCF with interval file having size of about 400MB. Previously their size of 3 GB. Is this right or I am making mistake somewhere?

ADD REPLY
0
Entering edit mode

Should be ok. Try vqsr to generate a tranche plot

ADD REPLY
0
Entering edit mode

@Nicolas Is this OK? Should I make --truth-sensitivity-filter-level 99.9 in ApplyVQSR.

ADD REPLY

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6