We have performed a targeted sequencing of a few megabases of human DNA on ~20 samples using Illumina GA2. I've done alignment and genotyping, but I know our targeting method (SureSelect) has coverage gaps where no reads will align. I would like to identify these gaps from the aligned reads. My end goals are:
1) Summary statistics of coverage depth for each sample
2) Identify regions where I have no coverage in any sample
3) Able to generate ad hoc plots of coverage depth using tools of my own choice
Before I put my nose down and start coding off of the SAM pileup format, has someone written a package (or function in a package) that solves this problem for me? I'd rather use someone else's debugged code than write my own. I'm not looking to install a web server and click around; my ideal output would be a matrix where each row is a locus and each column is a sample, with number of mapped reads as the matrix value.
Great steers, thanks very much.
Great reply Aaron, could you provide some code to illustrate the process for those who are not familiar with R/BioC??