Variant analysis on 1 or 2 samples: are the final steps of the GATK pipeline unneccessary?
0
0
Entering edit mode
6.5 years ago
BioinfGuru ★ 2.1k

Hello all,

I'm a bit confused as to what steps are necessary, and what steps are not going to add much benefit. I have 2 jobs to complete for 2 different research groups we support: 1) Germline short variant discovery on whole exome sequencing (WES) data collected from 1 mouse (1 sample in total), and 2) Germline short variant discovery on whole genome sequencing data (WGS) collected from 2 macaques (2 samples in total).

I have written a wrapper that follows the GATK best practices from fastq preprocessing to haplotypecaller with appropriate conditional loops and required files specific to each species and WGS/WES.

According to the GATK workflow - my next steps after running HaplotypeCaller (with --emit-ref-confidence GVCF) in the pipeline are 1) consolidate GVCFs, 2) Joint-calling cohort, and 3) VQSR ("probably the hardest part of the Best Practices to get right").

Considering I have only 1 or 2 samples, and in species where truth data sets may not be available - is it pointless doing some/all of these steps? Should I just stick to the variants called in each sample by HaplotypeCaller? Should I remove "--emit-ref-confidence GVCF" and just create a regular VCF?

Thank you, Kenneth

SNP GATK pipeline variant analysis • 1.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6