I have bam files of paired-end sequencing data. I want a bedGraph file that quantifies coverage across the genome, where each set of reads and the insert between them to be counted once. For instance, [Read1---insert---Read2] would all receive a coverage value of 1.
From what I can tell, bam files contain alignment information for the paired reads, and while the insert between the two reads can be inferred, coverage calculations exclude the insert and only use read coverage. Is this correct?
I've found that I can convert bam to bed with bedtools using the Paired-End mode to get read-pairs on the same line. I can see how I could use awk
to calculate coverage that counts Read1--insert--Read2 once, but I would rather use a package if it's available.
Does anyone know if bedtools genomecov
can take either the bam or the bedPE as input and quantify coverage of the the Reads+insert across the genome?
#Taken from bedtools website.
$ bedtools bamtobed -i reads.bam -bedpe | head -3
chr7 118965072 118965122 chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0 37 + -
chr11 46765606 46765656 chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0 37 + -
chr20 54704674 54704724 chr20 54708987 54709037 TUPAC_0001:3:1:1:1833#0 37 + -