Question

RNA-Seq: Comparing Expression between Cancer Grades

0

Entering edit mode

19 months ago

noodlejackson ▴ 40

Hi everyone,

I am doing a project where I am comparing gene expression between cancer grades.

What type of count data should I use? I have used VST in this plot. However, would FPKM be better since it would give more of an idea about absolute expression? But would the lack of variance stabilisation be a problem?

I'd love to hear your thoughts! Thank you.

enter image description here

factors rna-seq grades transcriptomics • 1.6k views

ADD COMMENT • link 19 months ago by noodlejackson ▴ 40

0

Entering edit mode

would FPKM be better since it would give more of an idea about absolute expression

Does it? Can you expand more on that statement please?

ADD REPLY • link 19 months ago by Ram 45k

0

Entering edit mode

Hi Ram! Sorry, I was reading up on this before posting and thought i'd read that somewhere but I can't find the source now. I can edit the question if that is wrong?

ADD REPLY • link 19 months ago by noodlejackson ▴ 40

2

Entering edit mode

I don't know if it's wrong but it sounds ... off. FPKM is an outdated metric and given that RNA-seq is an inherently relative exercise, there is no "absolute expression" as such. If there are no batch effects, using a quantile normalized metric or DESeq2's normalized counts is good. In any case, I think VST counts is better than FPKM, but wait for others to respond.

ADD REPLY • link 19 months ago by Ram 45k

2

Entering edit mode

I agree with Ram. DESeq2 normalizes for both between-sample comparisons and its vst function does variance stabilization.

If you're calculating expression relative to BPH, I'd just report the log fold change and its standard error estimate.

ADD REPLY • link 19 months ago by dsull ★ 7.6k

1

Entering edit mode

Thanks for your help Ram and dsull.

ADD REPLY • link 19 months ago by noodlejackson ▴ 40

score 2 · Accepted Answer · 2023-11-03

For differential analysis you would use raw counts for tools like edgeR, limma-voom or DESeq2. With limma-trend you could use logCPMs, calculated with edgeR.

For visualization, the suggested vst works well, though if you want to show values that are corrected for gene length then the FPKM that DESeq2 or edgeR can return are suitable as well. RPKM/FPKM per se is not a bad metric, as long as the normalization takes into account for both depth and library composition, hence allow sample-to-sample comparison. Both edgeR and DESeq2 do that, so it's fine as well. FPKM has the intuitive advantage that a gene with no counts has a value of zero, which vst does not. There non-detected genes have values > 0, but that's a minor thing, so choice is yours.