I was advised to standardize each gene (so that each gene has a zero mean and unit variance across all samples) when clustering the genes in a gene expression dataset, but I don't understand why. Let's say we have a gene whose expression is around 4000 for all cancer patients, and another gene whose expression is around 4 for all cancer patients, and say the expression levels are highly correlated. Then by standardizing each of those 2 genes, the distance between them (which originally was quite high due to the different scales) will be very low, and those genes will very likely end up in the same gene cluster. However, those genes look quite different to me (totally intuitively), and I don't understand why they should end up in the same cluster.
An example I read about the need for standardizing is that when a variable is in kilograms and another variable is grams, for example, these variables are not directly comparable, so standardization is needed, which makes a lot of sense. But in a gene expression dataset, all variables are in the same unit (say intensity in a microarray dataset), so I cannot relate gene expression data to that example. Can somebody explain why we need standardization for gene expression data generally, as well as specifically for clustering the genes? Thanks.