Dear Bioinfo Geeks,
I have obtained CLIP-Seq read coverage, which is mostly sequestered in 3’ UTR with relatively much lesser density in the CDS and 5’ UTR (Example). This scenario is true for almost all genes I looked. I am struggling to find a way to represent this pattern for all genes in one figure, which can explain that compared to 3’ UTR, the read density in other regions is lower. Given every gene has different count and lengths of 5' and 3'UTR and CDS, its difficult to adjust all to the same scale. I found some papers, where they used binned density, but I am not able to understand the basic steps to do so. Could some body please help.
Thanks for your suggestion. In the RSeqQC, I dont find any option for normalizing the read count for the specific region by gene expression (FPKM already calculated). I have conducted CLIP-Seq and the RNA-Seq (for normalization) for the sample in 3 replicates.
Using an approach similar to RSeqQC, I binned each gene feature (5' UTR, CDS, 3'UTR) in to 100 windows (quantiles). But the extraction of read count from BAM file using samtools is not fast enough, as for each case its 100 (bin) X 3 (repl) times. Could you please suggest me if the read count calculation for each bin can be made faster, may be considering the file format other than BAM.