I was wondering what approaches are commonly used to get tag counts from paired-end reads in ChIP-seq data. Naively, I wrote a script that takes each pair of paired-end reads and stitches them together to make a fragment. So my pipeline is:
Convert BAM to BED:
bamToBed -i reads.bam > reads.bed
Merge reads into fragments (I wrote a custom script here):
python stitch.py reads.bed > fragments.bed
EXAMPLE:
read1: chr1 5 10 +
read2: chr1 41 50 -
fragment: chr1 5 50 +
BED to BedGraph:
genomeCoverageBed -bga -i fragment.bed -g chrom.sizes > fragment.bedGraph
Basically, I am creating counts between the two reads to fill out the whole fragment. I could see someone arguing that I am creating data. So, I was wondering if my approach is similar to what others do when trying to visualize paired-end data.
I would vote that this is kind of "creating" data. You're not sequencing the insert, so why use it as evidence for anything? If you need tag counts, why not count only reads mapping in a proper pair (by SAM flag)?