Question

Scatter / Gather for BaseRecalibrator on a single human WES dataset?

0

Entering edit mode

4.8 years ago

asg • 0

Hello everyone! Please excuse me if this question is a bit naïve: I'm new to bioinformatics in general and GATK in particular.

I am using the GATK4 suite to ultimately call germline variants on whole exome sequencing data obtained from an Illumina NextSeq 550 sequencer. (For a variety of reasons I cannot use the WDL/Cromwell setup recommended by the Best Practices, so I am trying to replicate the recommended workflow as a series of Bash scripts.)

I would like to speed up the BQSR step by employing the Scatter / Gather strategy. However, studying this article (https://gatk.broadinstitute.org/hc/en-us/articles/360035890531-Base-Quality-Score-Recalibration-BQSR-), I've realized that BaseRecalibrator requires a lot of data to build a proper statistical model.

My question: is it okay to scatter the BaseCalibrator job by chromosome if I analyze just one WES sample at a time? (I know that downstream I will need to perform joint genotyping with 30+ samples, but at the moment I'm preparing single-sample BAM files one-by-one.)

The article above says specifically that BaseRecalibrator expects each read group to have at least 100M bases. Calculated naively, PF_HQ_ALIGNED_BASES / 23 = 215+ megabases (the metric is taken from the CollectAlignmentSummaryMetrics output).

Thank you!

— Alex.

P.S. This is a repost of my question from the GATK forum. I apologize if this is generally frowned upon, but since this is not a technical issue with the tool itself, the team could not offer any guidance as of yet.

next-gen gatk bqsr exome human • 1.3k views

ADD COMMENT • link 4.8 years ago by asg • 0

0

Entering edit mode

Could you post your BQSR command please ? How many samples do you have ? BQSR should not take too much time in my experience..

ADD REPLY • link 4.8 years ago by Nicolas Rosewick 11k