Entering edit mode
20 months ago
Chironex
▴
50
Hi, I was trying to calculate distance matrix from my dataset to generate aMultidimensional Scaling Plot. However, values generated from dist() function when I try to calculate the distance matrix are too high. There is a setting to modify it?
d <- dist( data, diag = TRUE, upper = TRUE )
d
A_1 A_2 B_1 B_2 C_1 C_2 D_1 D_2 E_1 E_2
A_1 0.0000 68539.5393 33573.1135 50894.2207 70578.2982 20347.0745 70435.1249 38158.1997 59292.5268 69776.5004
A_2 68539.5393 0.0000 36886.1659 19122.1184 2111.8398 51918.3046 1978.4474 31921.6694 9647.1810 1376.4292
B_1 33573.1135 36886.1659 0.0000 18568.7180 38890.6770 18611.7210 38722.0872 7575.4019 27989.3274 38108.9009
B_2 50894.2207 19122.1184 18568.7180 0.0000 21092.5528 34135.1095 20920.8795 13725.5401 11183.2031 20331.0117
C_1 70578.2982 2111.8398 38890.6770 21092.5528 0.0000 53940.1809 367.0356 33946.4947 11618.8036 931.8314
C_2 20347.0745 51918.3046 18611.7210 34135.1095 53940.1809 0.0000 53801.9337 21530.5768 42965.9619 53150.4690
D_1 70435.1249 1978.4474 38722.0872 20920.8795 367.0356 53801.9337 0.0000 33780.6044 11486.1160 859.4139
D_2 38158.1997 31921.6694 7575.4019 13725.5401 33946.4947 21530.5768 33780.6044 0.0000 23204.5176 33160.8060
E_1 59292.5268 9647.1810 27989.3274 11183.2031 11618.8036 42965.9619 11486.1160 23204.5176 0.0000 10776.8957
E_2 69776.5004 1376.4292 38108.9009 20331.0117 931.8314 53150.4690 859.4139 33160.8060 10776.8957 0.0000
> mds <- cmdscale(d, k = 2, add = F)
> mds
[,1] [,2]
A_1 47358.298 5817.746
A_2 -20986.848 1126.265
B_1 15249.608 -3117.375
B_2 -2668.105 -3339.063
C_1 -23030.303 1128.455
C_2 30486.758 -2477.329
D_1 -22883.553 1029.804
D_2 10449.603 -3499.095
E_1 -11744.748 2139.385
E_2 -22230.711 1191.206
Thanks
Why are these values considered to high to plot? Regardless, it's the relationship between points that is important not the numeric value. And since it's a metric scale for
cmdscale
you could divide all values in the output by 10000 and it should still be a valid visual representation of your data when plotted.If you are really concerned about the values, you could also try other ordination analyses using the
vegan
library in R. You can implement a nonmetric MDS (metaMDS
) where the numeric axes scales are meaningless.As an aside, why are there duplicates of each number in your matrix?
Hi, Im sorry, there are not duplicate names. I substituted with letters to better understand. ( 1 and 2 are different days, just to show here).
The problem arises from the beginning I think, because I have aggregate singlecell rna seq samples in pseudobulks, creating this matrix:
I have more than 25000 genes, here I only show to u ten. As u can see, expression profile in each sample is different, because the number of cells for each sample is different:
NUMBER OF CELLS:
I need so to normalize them. Can u suggest a way to do it prior to calculate distance matrix?