Hi!
I want to construct a average gene profile of my ChIP-seq data (BAM file). What I am aiming at is to get the normalised length of all genes in refseq or ensembl and bin them in say 10 bins and construct the profile, this will assist to visualise if the binding is on TSS, Genebody, TES etc.
Is their already a ready made solution in offering ...... by someone who has done this before?
As an example figure 2C
Thank you
@Madelaine: could you please provide the complete code that you used, thank you.
Errr... This should be enough to understand how to basically set it up if you know R.
The code I summarized this from is doing a lot more involving summarizing up and downstream regions of genes in addition to the gene body, checking for overlapping genes, etc., so I feel it would be too confusing and messy to really be a big help in its entirety.
I agree with Madelaine that it would probably clutter things up to add more. One comment though. She's using bioconductor so you would need to install that. If you are just starting R (and probably for a lot of people that know R but don't do bioinformatics with it), that might not be obvious. Here is how you install it, http://www.bioconductor.org/install/. All the packages she's using are on the bioconductor website.
I followed the link from seqanwser. actually I found ngsplot https://code.google.com/p/ngsplot/ can do exactly the same thing you (I) want to do.
Looking at this code, I was wondering if it is sensitive to genes on the forward vs. the reverse strand. One needs to reverse the order of traversal of the gene based on the direction in which the gene is transcribed to characterize the TSS and TES correctly.
Yep, that's why I said "Additionally you need to remember to use rev() on the coverage values for genes on the C/- strand... "