Hi, All,
I have CLIP-seq datasets of RNA binding proteins, and I wonder how could I do motif discoveries from these datasets to answer the question that if an RNA binding protein prefers binding to specific motifs in mRNAs in the human genome. I know how to align the reads to the human genome and do peak calling (finding clusters). After that, I have a bed file with clusters from peak calling software. I can see that lots of clusters are from rRNA, ncRNA, intron, etc, which are not mRNA.
My questions are: 1) how do I filter out the clusters which are mapped only to mRNA? Is there an available package or tool that can do this?
2) Is there a consensus in the bioinformatics field that which regions of mRNA should be included in motif discovery analysis, i.e, should 5'UTR and 3' UTR be included along with CDS (or exons) of mRNA for the input in motif discovery?
Thanks ahead,
Xiao