Dear all, I have a question regarding which input should I use for a downstream analysis pipeline.
I want to select differentially expressed genes in an RNA-Seq dataset (RSEM estimated counts) with paired data in cancer (cancer vs healthy). Then, I need to discretize the expression values of the differentially expressed genes for downstream analysis.
I have been reading various sources and have concluded that I need the expression values as TPMs or CPMs. I know that I need to correct for library size but I don't think that gene length bias would impact the discretization process. So right now CPM or log2(CPM + 2) output from edgeR should be more suitable for my intended. Am I right about this? Thanks in advance.