Entering edit mode
7.0 years ago
aquaq
▴
40
I was interested in differential expression and started to read related materials. They all state that both RNA-seq and microarray data gives relative abundances (I understand that both can be absolute with some reference). As far as I know, this means interval scale instead of ratio scale. Then, they describe that while calculating DE the ratio of two different experiment conditions is used. However, ratio is not valid on interval scale.
What am I missing? Is it not interval scale? Or are there any assumptions that makes these calculations valid?
Thanks for answers!
I'd say commonly accepted expression measurements would be considered ratio scale, not interval scale. Consider RNA-Seq, where the fundamental measurement is a count of sequence reads. This measure has a meaningful zero point, and DE ratios between mapped counts are meaningful (for example, under a treatment condition, after normalization there may be twice as many reads mapping to gene X, compared to a reference baseline condition, which is interpreted as twice as many transcripts from gene X being present under treatment). In the case of microarrays, the fundamental measurement is a fluorescent hybridization intensity, which also has a meaningful zero point, and ratios of those intensities that are computed in DE analysis are also meaningful (within any single defined probe set).
Thanks. I was thinking on microarray and realozed that with flurescency, it might be ratio scale. However, I've read that RNA-Seq measure counts relative to the number of all reads, and absolute quantification would be possible by adding controls (spikes).
Yes, the 2 are fundamentally different, with microarray being based on fluorescent intensities and RNA-seq being related to read counts. Expression in microarray is then estimated by first performing a background correction to remove background signal (using control probes), quantile normalising, and then transforming by log2. Expression in RNA-seq is estimated by normalising read counts over each transcript in relation to the counts in all other samples in your experiment (for EdgeR and DESeq).
The resulting values in both cases are relative to the actual expression but are assumed to have high precision, particularly in terms of RNA-seq.
Note that technology like NanoString returns integer counts as it measures actual individual mRNA transcripts.