I have a cohort of patients that we are doing HIV deep sequencing of integrated provirus and because of the technical methods we are using we also get some of host (human) genomic sequence ... I'm estimating 2 billion bp, so ~1X coverage. Its not going to be enough coverage to capture any clinically relevant human SNPs (unless we get extremely lucky).
The data is a combination of Illumina NextSeq and PacBio reads (mostly Illumina) on what will ultimately be >500 patients.
I already know what I'm going to do with the HIV sequence but I'm looking to squeeze something interesting out of this "leftover" data.
Any ideas?
Check this out: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3400344/
Hmm, that seemed to be exome sequencing (or at least from my 60 second scan). We are getting whole genome sequence, there's no way to enrich for exomes. But maybe using this and pooling would work.
The main point of that paper is not about exome sequencing. It is saying that you can call pretty good variants from off-target reads with imputation. The off-target reads are similar to your "undesired" human reads at ~1X coverage. There is another post today about using off-target reads for CNV calling.