Hi, I have raw calls for indels called using GATK, I wonder, where to start? filtering these calls? Should I plot the quality, depth etc. and then decide threshold to filter the bad calls? Any suggestions, scripts would be useful.
chr10 264423 . G GC 29875 . AC=26;AF=1.00;AN=26;BaseQRankSum=-0.054;DP=717;FS=8.633;HaplotypeScore=344.1280;InbreedingCoeff=-0.0046;MLEAC=26;MLEAF=1.00;MQ=38.57;MQ0=0;MQRankSum=-0.832;QD=41.67;RPA=1,2;RU=C;ReadPosRankSum=0.979;SB=-4.035e+03;STR GT:AD:DP:GQ:PL 1/1:6,242:250:99:10448,743,0 1/1:10,193:203:99:8479,571,0 1/1:3,142:148:99:6167,439,0 1/1:0,13:13:39:551,39,0 1/1:1,9:10:30:424,30,0 1/1:1,13:15:39:550,39,0 1/1:0,9:9:27:382,27,0 1/1:1,15:16:48:667,48,0 1/1:1,11:12:33:467,33,0 1/1:1,8:9:27:382,27,0 1/1:0,18:18:54:764,54,0 1/1:0,11:11:33:467,33,0 1/1:0,3:3:9:127,9,0
Looking forward for your suggestions and feedback. /Bari,
What are you looking for in these indels? What experimental model are you using? What's your hypothesis? It's hard to really answer your question meaningfully without more information.
These indels are from resequencing data from cows, we have two groups of cows and we want to see indels specific to one group or other but not both. (Does this answers your question?)
Then your task is going to focus on three things in your subject genotype calls (the part of your vcf above preceded by 1/1...etc) See my answer below.