CLIP-Seq read density
1
0
Entering edit mode
8.5 years ago

Dear Bioinfo Geeks,

I have obtained CLIP-Seq read coverage, which is mostly sequestered in 3’ UTR with relatively much lesser density in the CDS and 5’ UTR (Example). This scenario is true for almost all genes I looked. I am struggling to find a way to represent this pattern for all genes in one figure, which can explain that compared to 3’ UTR, the read density in other regions is lower. Given every gene has different count and lengths of 5' and 3'UTR and CDS, its difficult to adjust all to the same scale. I found some papers, where they used binned density, but I am not able to understand the basic steps to do so. Could some body please help.

rna-seq • 2.3k views
ADD COMMENT
2
Entering edit mode
8.5 years ago
Martombo ★ 3.1k

check RSeQC out, especially read_distribution.py and gene_body_coverage.py. The former can count the density of reads in the different genetic regions, while the latter can produce a stacked coverage picture. For that you'll need to create a custom bed file, around the stop codon of the genes.

ADD COMMENT
0
Entering edit mode

Thanks for your suggestion. In the RSeqQC, I dont find any option for normalizing the read count for the specific region by gene expression (FPKM already calculated). I have conducted CLIP-Seq and the RNA-Seq (for normalization) for the sample in 3 replicates.

Using an approach similar to RSeqQC, I binned each gene feature (5' UTR, CDS, 3'UTR) in to 100 windows (quantiles). But the extraction of read count from BAM file using samtools is not fast enough, as for each case its 100 (bin) X 3 (repl) times. Could you please suggest me if the read count calculation for each bin can be made faster, may be considering the file format other than BAM.

ADD REPLY

Login before adding your answer.

Traffic: 1232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6