I recently read papers involving identifying disease-causing mutations using GATK for exome sequencing data. Most paper's pipeline is sth. like:
Call SNP/indel using GATK UnifiedGenotyper. All calls with a read coverage ≤2× and a Phred-scaled SNP quality of ≤20 were filtered out.
I'm then curious which step of GATK filtering is for RD<2 and qualify<20? (Simply too many steps for GATK!!!)
I compare both GATK-2 and GATK-3.
GATK2:
For exomes with deep coverage per sample
DATA_TYPE_SPECIFIC_FILTERS should be "QUAL < 30.0 || QD < 5.0 || HRun > 5 || SB > -0.10"
GATK3:
For SNPs
DATA_TYPE_SPECIFIC_FILTERS should be "QD < 2.0", "MQ < 40.0", "FS > 60.0", "HaplotypeScore > 13.0", "MQRankSum < -12.5", "ReadPosRankSum < -8.0".
For Indels
DATA_TYPE_SPECIFIC_FILTERS should be "QD < 2.0", "ReadPosRankSum < -20.0", "InbreedingCoeff < -0.8", "FS > 200.0".
I would say most published paper should have used GATK2; so for users who read paper using GATK2, and use GATK3 for our own research may get confused, so let me clarify:
MQ < 40.0 in GATK3 is equivalent to QUAL < 30.0 in GATK2,
and this is mapping quality filter, right?
QD < 2.0 in GATK3 is equivalent to QD<5.0 in GATK2
, and this is Read-depth filter, right?
FS > 60.0 in GATK3 is equivalent to SB > -0.10
in GATK2, and this is strand bias filter, right?
thx