1) Can we use similar tools for CAGE seq as well ? e.g. STAR for alignment and Cufflinks for getting FPKM values , Fusioncatcher for getting the fusions ?
2) Another question is if we have samples from RNAseq as well. Can we compare the CAGEseq samples vs RNAseq samples to do differential expression and clustering?
As far as I know there is no influence of the transcript length on the read counts in CAGE-Seq, so therefore FPKM normalization would be inappropriate.
To take the best from a CAGE experiment, I recommend to call peaks from the 5′ ends of the aligned CAGE reads (for instance with paraclu, and run the differential expression analysis at the peak level. Alternatively, one can just intersect the 5′ ends with promoter regions, for instance using the FANTOM5 peaks, or a region flanking the start site of GENCODE, RefSeq or the FANTOM Cat. More complicated approaches also exist, for instance RECLU.
I would not compare directly a CAGE library to a RNA-seq library: they have different purposes, and the most appropriate tools to process them differ. However, if you have a full CAGE dataset matched with a full RNA-seq dataset, you can compare the results of each analysis. In many cases, I would expect them to cross-validate, for instance when « gene A is induced by treatment T ». But you can also see a differential promoter analysis with CAGE that is not reflected in RNA-seq, or a differential splicing highlighted with RNA-seq but not reflected at the promoter level with CAGE...
I just want to use FANTOM data which has different cell types and do clustering analysis with regular RNAseq data based on Expression levels.Has anybody done this kind of analysis ?
I am not interested in the primary purpose served by CAGE,thats what I mean.Since FANTOM has multiple cell types Expression data ,I just want to use them.
IMO, you are missing the main point of CAGE, that it is done to know the exact location of TSS (Transcription Start Site) and differential promoter usage
http://www.pnas.org/content/100/26/15776
The expression quantification part is a by-product of CAGE, and in numerous papers (see pubmed), it has been shown to correlate very well with RNA-seq data. An example https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3975069/
We found that the quantified levels of gene expression are largely
comparable across platforms and conclude that CAGE and RNA-seq are
complementary technologies that can be used to improve incomplete gene
models
Most of the data from CAGE protocol comprised of only 5' end and usually short reads. So you can not use CAGE data to study fusion, unless you have a long read PE CAGE data.
CAGE-Seq and RNA-Seq are complimentary to each other in terms of studying the gene expression levels, as long as you have data with decent coverage. I would say CAGE is more accurate as it captures the mRNAs with 5' cap.
As far as I know there is no influence of the transcript length on the read counts in CAGE-Seq, so therefore FPKM normalization would be inappropriate.