This is a more generalised question. I wish to discover variants in my raw WGS data which has been produced by studies carried out by Broad Institute. What modifications would you recommend for the standard GATK best practices workflow to better suit Broad-produced data ?
For e.g., while GATK/Broad strongly recommends recalibrating base qualities, this workshop says at the end that
All recent Broad‐produced data is already recalibrated
Thank you ! GATK forums seems to be having some issues with enabling posting and commenting because of recent spam reports but I will retry posting there.
It depends what form of the data you’re starting from. If you’re starting from true raw WGS data, ie unmapped reads (in fastq or ubam) then you should follow the best practices as laid out in the GATK documentation. However if you’re starting from an aligned bam or cram file you received from the Broad’s Genomic Services, then you don’t need to do the pre-processing part and you can go straight to the variant calling part.
In general you can check what processing has been applied to the data in a bam file by looking at th PG lines in the header.
If you need any additional info, please ask on the GATK forum and thank you for your patience while we deal with the spam issues.
I think the GATK forum would be a more appropriate place for this question, but I have asked someone from GATK to take a look here via twitter.
Thank you ! GATK forums seems to be having some issues with enabling posting and commenting because of recent spam reports but I will retry posting there.