Question

Normalization of RNA transcript abundance data

0

Entering edit mode

17 months ago

wrab425 ▴ 50

Is it more reliable to normalize a set of specific transcript abundances by the mean or median of the total coding transcript abundance and if so why?

normalization transcript-abundance • 1.3k views

ADD COMMENT • link 17 months ago by wrab425 ▴ 50

0

Entering edit mode

whats your analytical goal.

ADD REPLY • link 17 months ago by LauferVA 4.5k

score 2 · Answer 1 · 2023-06-13

2

Entering edit mode

17 months ago

seidel 11k

When you mention Normalization, it's important to specify for what purpose. Are you aiming to compare transcript abundance within a sample? Between samples? Both? The answer may also be technology dependent. If this is RNA-Seq data, there's a common problem where a small number of genes take up the majority of sequence reads, and can thus skew the distribution of reads available to other transcripts in the sample. This can make the median unstable (Bullard et al., 2010), and make the mean less meaningful. For comparing between samples, most people use the Trimmed Mean of M values (Robinson & Oshlack, 2010). Reading your question at face value, if you just want a rough comparison of transcripts within a sample that works most of the time for some limited purpose, I would probably consider the median a robust value, but you could easily chose some other quantile (e.g. 75%, or why not some "control" transcript?). Otherwise, there are a variety of methods and issues (Zhao et al., 2021) involved. Perhaps you could read a little and clarify what you're trying to achieve.

ADD COMMENT • link 17 months ago by seidel 11k

0

Entering edit mode

Thanks, I will read these references and then clarify as you suggest.

ADD REPLY • link 17 months ago by wrab425 ▴ 50

0

Entering edit mode

Thanks a lot for this guide. I read Bullard et al., 2010 and Robinson & Oshlack, 2010 and worked from these. I want to use Weighted Gene Co-expression Network Analysis on the TCGA data but noted on the tsv files there is a column that gives the tpm-unstranded figure for all of the genes so this type of normalization has already been done. If you have any reservations about putting these numbers into the https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/ then please let me know.

ADD REPLY • link 17 months ago by wrab425 ▴ 50