I am doing a project where I am comparing gene expression between cancer grades.
What type of count data should I use? I have used VST in this plot. However, would FPKM be better since it would give more of an idea about absolute expression? But would the lack of variance stabilisation be a problem?
Hi Ram! Sorry, I was reading up on this before posting and thought i'd read that somewhere but I can't find the source now. I can edit the question if that is wrong?
I don't know if it's wrong but it sounds ... off. FPKM is an outdated metric and given that RNA-seq is an inherently relative exercise, there is no "absolute expression" as such. If there are no batch effects, using a quantile normalized metric or DESeq2's normalized counts is good. In any case, I think VST counts is better than FPKM, but wait for others to respond.
For differential analysis you would use raw counts for tools like edgeR, limma-voom or DESeq2. With limma-trend you could use logCPMs, calculated with edgeR.
For visualization, the suggested vst works well, though if you want to show values that are corrected for gene length then the FPKM that DESeq2 or edgeR can return are suitable as well. RPKM/FPKM per se is not a bad metric, as long as the normalization takes into account for both depth and library composition, hence allow sample-to-sample comparison. Both edgeR and DESeq2 do that, so it's fine as well. FPKM has the intuitive advantage that a gene with no counts has a value of zero, which vst does not. There non-detected genes have values > 0, but that's a minor thing, so choice is yours.
Does it? Can you expand more on that statement please?
Hi Ram! Sorry, I was reading up on this before posting and thought i'd read that somewhere but I can't find the source now. I can edit the question if that is wrong?
I don't know if it's wrong but it sounds ... off. FPKM is an outdated metric and given that RNA-seq is an inherently relative exercise, there is no "absolute expression" as such. If there are no batch effects, using a quantile normalized metric or DESeq2's normalized counts is good. In any case, I think VST counts is better than FPKM, but wait for others to respond.
I agree with Ram. DESeq2 normalizes for both between-sample comparisons and its vst function does variance stabilization.
If you're calculating expression relative to BPH, I'd just report the log fold change and its standard error estimate.
Thanks for your help Ram and dsull.