Question

Normalization and other ambiguous terminology

0

Entering edit mode

8.6 years ago

John 13k

Hi :)

I find when I write notes or try and explain how something works to someone else, I easily lose track of when some value has been "normalized", and what exactly normalization in this circumstance means.

For example, a particular ChIP-Seq assay could be:

Raw signal
Normalized by total reads in sample
Normalized by total signal in sample
"Normalized" to signal in input control (which generally gets normalized itself)
Quantile normalized to other samples of the same assay type

I'm sure there are other, more complicated, ways to "normalize" -- and perhaps in half the cases where I say I have normalized the data I have actually just "transformed it" (certainly the distribution is not normal) -- but those are the ones I commonly do. Given that their is so much room for ambiguity here, I was wondering if there is a standard nomenclature for this in Mathematics or Statistics? I don't think there are enough symbols for every normalization scenario - but just to be able to differentiate between input normalized, read count normalized, and not at all normalized, would really help. Googling "normalization symbol" didn't help :(

If you have any other examples of confusing terminology or non-specific "bioinformatic slang", it would be great to hear about those too :) Some times I don't realize how unspecific I'm being when I say things (usually because I expect everyone to know what I mean), so it would be great to hear common tropes people encounter or use themselves.

normalization • 1.9k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 8.6 years ago by John 13k

score 1 · Answer 1 · 2016-04-24

Unfortunately, I don't believe there is a universal way of indicating this. I've seen hat and tilde versions of variables used to represent some kind of "normalizing" transformation, but this is likely abuse of nomenclature at some level.

You're better off explicitly defining the transformation first, then if you need a variable to represent that, define the variable in terms of the transformation function and the data. Those who understand the transformation being used will likely also recognize the symbolic form, and if they don't, perhaps it would be better to explain using a simplified example.

Another personal opinion: as an audience member in cross-disciplinary talks, I'd rather see the explicit mapping of data to its transformed state than a field-specific symbol representing a concept I'm likely not familiar with. However, that's also dependent on your audience at the time.