We have updated our workflows for per base sequence coverage to use genomeCoverageBed from BAM files. However for pair-end data it seems as though the regions between pair-end reads are not counted.
To be clear I am not talking about using -split for not counting introns in a single read of a paired-end, instead I am looking to count the probable whole insert when the insert size is greater than the combined read length of the paired reads.
We've looked at using iRanges from BioConductor as well but cannot tell if this would do what we want.
Is there is hidden flag in genomeCoverageBed to count the whole insert as coverage, not just the sequenced ends? Is there another program out there what would work on BAM files?
I know I can alter the SAM file before BAM conversion but this seems like something that should be coded somewhere already.
@Sean Davis - this measures the "physical" (i.e., not the "nucleotide") coverage of a sample genome. It is often used to measure the ability to detect structural variant breakpoints by paired-end mapping, as PREM relies upon the breakpoint lying in the unsequenced interstice between the sequenced ends.
Why would you want to know this? What is the use case for having this information?
Thanks Aaron. I see. We haven't quantified things this way, but it makes perfect sense to want to do so.
In a word: MeDIP-seq. Nucleosomes are ~147bp. Early NGS = 33bp. If you optimise for the whole nucleosome then chromatin rearrangements are more obvious. There are other associated analysis as well.
Did you come up with a final solution to this problem? I realize that GATK seems to have a walker for this (http://gatkforums.broadinstitute.org/discussion/1494/computereadspancoverage), but in the latest version 3.3.0 I downloaded from the Broad Web site it does not recognize the command
-T ComputeReadSpanCoverage
.