Currently, I am trying to analyze some CAGE data from the FANTOM5 consortium (fantom.gsc.riken.jp and for the data http://fantom.gsc.riken.jp/5/data/).
Unfortunately, I am not that experienced in analyzing CAGE data. They do provide some processed data for both the TSS expression and enhancers and some processing software. However, I want reprocess these data myself and to start at least with the BAM files. So given CAGE BAM files, what would you suggest to use to get out both a sensible value of gene expression for each TSS (and Isoform) and the same for the enhancers?
For the former, usually I would simply take cuffdiff with some GENCODE annotation, but I am unsure whether this is just sensible for RNA-Seq and not for CAGE - or what would you use for it? And I still have no idea how to do it similarly with enhancers (under the assumption that I am having their annotation) - should I count the number of tags falling into these regions?. Also do you know how the "activity" of an enhancer is related to eRNA abundance.
Generally speaking, should I consider the mRNA abundance for a gene measured with CAGE in FANTOM5 as the level of transcription initiation or the the steady-state mRNA level after posttranscriptional regulation (by miRNAs)? Also are other RNAs like miRNAs or lncRNAs included?
It would be great if you can help me here.
Many thanks for your answer. I am still wondering, however, whether or not I can get out the isoform expression levels from the CAGE data. That would be very helpful to me! However, for many different isoforms (in GENCODE), we have the same TSS, so (how) is it possible to separate between these different isoforms?