Question

GATK best practices for Broad-produced NGS data

0

Entering edit mode

6.0 years ago

Mehulsharma.253 ▴ 30

This is a more generalised question. I wish to discover variants in my raw WGS data which has been produced by studies carried out by Broad Institute. What modifications would you recommend for the standard GATK best practices workflow to better suit Broad-produced data ?

For e.g., while GATK/Broad strongly recommends recalibrating base qualities, this workshop says at the end that

All recent Broad‐produced data is already recalibrated

NGS GATK4 BroadInstitute Variant-Calling • 1.5k views

ADD COMMENT • link updated 8 months ago by Ram 44k • written 6.0 years ago by Mehulsharma.253 ▴ 30

2

Entering edit mode

I think the GATK forum would be a more appropriate place for this question, but I have asked someone from GATK to take a look here via twitter.

ADD REPLY • link 6.0 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you ! GATK forums seems to be having some issues with enabling posting and commenting because of recent spam reports but I will retry posting there.

ADD REPLY • link 6.0 years ago by Mehulsharma.253 ▴ 30

score 2 · Answer 1 · 2018-12-10

It depends what form of the data you’re starting from. If you’re starting from true raw WGS data, ie unmapped reads (in fastq or ubam) then you should follow the best practices as laid out in the GATK documentation. However if you’re starting from an aligned bam or cram file you received from the Broad’s Genomic Services, then you don’t need to do the pre-processing part and you can go straight to the variant calling part.

In general you can check what processing has been applied to the data in a bam file by looking at th PG lines in the header.

If you need any additional info, please ask on the GATK forum and thank you for your patience while we deal with the spam issues.