I'm planning on performing some hierarchical clustering on the TPM data we already have, but are unsure whether to log-transform the data to calculate the Euclidean distance matrix.
While for correlation purposes we have already already log-transformed the data, but Euclidean distance does not appear to have a linearity requirement. TPM is already normalized data (although I'm unsure whether it's scale-normalized), so our team is not particularly sure where we still need to log-normalize the TPMs prior to calculating the Euclidean distance.
Anyone has any insight on this?
I would always use log2 normalized expression values for clustering. Or maybe even better, make z-scores of your log2 expression values before hierarchical clustering.