We are proposing to use Agilent SureSelect for whole exome sequencing of four cases with mutations identified in a single gene having 32 exons. Supposing that the SureSelect covers 30 of 32 exons, and they are 75% on target with 90% alignment, all four samples could be done at 100x coverage in a single Hi-Seq lane. The reason for doing so is to show that we can use a NGS method to identify variants normally covered by Sanger sequencing kids in CLIA labs at 10x the cost. If we can, we plan to extend this to an 80-gene panel.
In your considered opinions, is coverage with these parameters sufficient? Further, if we hope to show we can identify 2 or 3 of the 4 mutations in this gene using NGS, are there shared controls available which would be appropriate and available if we wanted to use this sequencing data? Or do you think batch differences between institutions/machines/DNA sample prep differences would instead warrant only using controls sequenced with the same Agilent kit and similar coverage on the same machines at our institution? Better to have solid experiment design before proceeding than have grant reviewers shred us for comparing apples to oranges.
Please let me know your experiences or any relevant publications or edit the above tags. And thanks.
I had not heard that term: Sanger Band-aids. If that is not already coined, it should be.
As I understand it, investigators using this same pipeline will often see a dozen individuals apparently homozygous for a SNP never before seen which then turns out to be a sequencing artifact.
Thanks also, Alex, for pointing me to Galaxy to track this. I use it for a number of other UCSC issues. It would be helpful to get some background using it for NGS workflows. It looks like one is posted here: https://test.g2.bx.psu.edu/u/cjav/w/gatk . Any that are considered better?
I'm not sure how often spurious homozygous variants turn out to be artifact -- but it does happen. Best to filter your variants also through the >5400 exomes available through the NHLBI's exome variant server: http://evs.gs.washington.edu/EVS/
The public Galaxy page now includes a beta GATK install, and the nice folks at Galaxy are really really helpful at helping design custom workflows to meet your needs.
Hi Alex, I did not see any tools available or in development that will let us query against the NHLBI exome variant server? Can you point me to any workflows that include this or suggest who in Galaxy would be the contact to get that implemented. It sounds like a great resource. But also the first time I've heard of it.
Hi Ryan - The ESP5400 SNP data can be downloaded from EVS via their "downloads" page at http://evs.gs.washington.edu/EVS/
You can then use local scripts to query that data. EVS data are not integrated into Galaxy afaik, but if you want to email me off-line I can put you in touch with people who are helping our group design custom workflows in Galaxy.