I've just read this post on Biostars, but I'm still unsure what intervals from Agilent I should use for my human exome variant calling with GATK. The exome enrichment kit our lab uses is Agilent Exome Capture Kit Sure Select XT Target Enrichment System for Illumina Paired –End Sequencing Library. Version XTV4.
I was able to find a list of different designs on the Agilent website eArray https://earray.chem.agilent.com. I am assuming this design named "SureSelect Human All Exon V4" is for the kit we are using, there're a few .bed files in this design such as "regions, alltracks, covered, etc". Should I use the one named "covered" for my variant calling?
ps: my reference genome is human hg19
Sorry for the naive questions, please be gentle. thanks for reply.
Why cant we use the default interval list as mentioned here here
You can use
regions
mentioned in the websiteI'd use any of them + 100bp to the left and to the right
Why is this, is this because you can enrich pretty effectively for sequences that are 100bp on wither side of the probe because of some kind of overhang effect?
sure, usually regions in some area around the probe is also covered with reads and may be efficiently used for SNVs detection, some people use 200 bp, some people calculate effective enrichment looking at the results instead of provided regions file