Question

CLIP-Seq read density

0

Entering edit mode

9.2 years ago

Anil Kesarwani ▴ 90

Dear Bioinfo Geeks,

I have obtained CLIP-Seq read coverage, which is mostly sequestered in 3’ UTR with relatively much lesser density in the CDS and 5’ UTR (Example). This scenario is true for almost all genes I looked. I am struggling to find a way to represent this pattern for all genes in one figure, which can explain that compared to 3’ UTR, the read density in other regions is lower. Given every gene has different count and lengths of 5' and 3'UTR and CDS, its difficult to adjust all to the same scale. I found some papers, where they used binned density, but I am not able to understand the basic steps to do so. Could some body please help.

rna-seq • 2.5k views

ADD COMMENT • link updated 9.2 years ago by GenoMax 152k • written 9.2 years ago by Anil Kesarwani ▴ 90

score 2 · Answer 1 · 2016-06-03

2

Entering edit mode

9.2 years ago

Martombo ★ 3.2k

check RSeQC out, especially read_distribution.py and gene_body_coverage.py. The former can count the density of reads in the different genetic regions, while the latter can produce a stacked coverage picture. For that you'll need to create a custom bed file, around the stop codon of the genes.

ADD COMMENT • link 9.2 years ago by Martombo ★ 3.2k

0

Entering edit mode

Thanks for your suggestion. In the RSeqQC, I dont find any option for normalizing the read count for the specific region by gene expression (FPKM already calculated). I have conducted CLIP-Seq and the RNA-Seq (for normalization) for the sample in 3 replicates.

Using an approach similar to RSeqQC, I binned each gene feature (5' UTR, CDS, 3'UTR) in to 100 windows (quantiles). But the extraction of read count from BAM file using samtools is not fast enough, as for each case its 100 (bin) X 3 (repl) times. Could you please suggest me if the read count calculation for each bin can be made faster, may be considering the file format other than BAM.

ADD REPLY • link 9.2 years ago by Anil Kesarwani ▴ 90