Is anyone have experience of CAGE-seq data analysis?
Currently I did basic mapping to genome (Am I right that mapping to genome?). I have following specified questions actually:
1, do I need remove redundant mapped tags, i.e., tags mapped to same locations?
2, how did I do the normalization? I have search some related paper, the formula were too complex to me through
3, how did I find top-enriched regions? I suspect most if the regions should around promoter region and could use the existed annotation to found them, but this way I might lose some new promoter regions?
Any response would be greatly appreciated. Thanks.
On (1), I don't think you should remove redundant mapped tags, because you will expect a high density of tags mapping to the same location due to the way the technique CAGE works with lots of enrichment at the 5' end. On the rest I think you just need to read the CAGE papers to find out what they did.
As CAGE also produces results similar to RNA-SEQ experiments i think u can use packages like EdgeR or others to normalize the data. first you have to use your mapped reads to count their occurrences.
Nice suggestions! I will try :)