change the axis scale in mds plot
0
0
Entering edit mode
20 months ago
Chironex ▴ 50

Hi, I was trying to calculate distance matrix from my dataset to generate aMultidimensional Scaling Plot. However, values generated from dist() function when I try to calculate the distance matrix are too high. There is a setting to modify it?

d <- dist( data, diag = TRUE, upper = TRUE )


d


         A_1        A_2        B_1         B_2       C_1          C_2          D_1         D_2        E_1         E_2            
A_1     0.0000 68539.5393 33573.1135 50894.2207 70578.2982 20347.0745 70435.1249 38158.1997 59292.5268 69776.5004
A_2 68539.5393     0.0000 36886.1659 19122.1184  2111.8398 51918.3046  1978.4474 31921.6694  9647.1810  1376.4292
B_1 33573.1135 36886.1659     0.0000 18568.7180 38890.6770 18611.7210 38722.0872  7575.4019 27989.3274 38108.9009
B_2 50894.2207 19122.1184 18568.7180     0.0000 21092.5528 34135.1095 20920.8795 13725.5401 11183.2031 20331.0117
C_1 70578.2982  2111.8398 38890.6770 21092.5528     0.0000 53940.1809   367.0356 33946.4947 11618.8036   931.8314
C_2 20347.0745 51918.3046 18611.7210 34135.1095 53940.1809     0.0000 53801.9337 21530.5768 42965.9619 53150.4690
D_1 70435.1249  1978.4474 38722.0872 20920.8795   367.0356 53801.9337     0.0000 33780.6044 11486.1160   859.4139
D_2 38158.1997 31921.6694  7575.4019 13725.5401 33946.4947 21530.5768 33780.6044     0.0000 23204.5176 33160.8060
E_1 59292.5268  9647.1810 27989.3274 11183.2031 11618.8036 42965.9619 11486.1160 23204.5176     0.0000 10776.8957
E_2 69776.5004  1376.4292 38108.9009 20331.0117   931.8314 53150.4690   859.4139 33160.8060 10776.8957     0.0000


    > mds <- cmdscale(d, k = 2, add = F) 
    > mds
     [,1]      [,2]
A_1  47358.298  5817.746
A_2 -20986.848  1126.265
B_1  15249.608 -3117.375
B_2  -2668.105 -3339.063
C_1 -23030.303  1128.455
C_2  30486.758 -2477.329
D_1 -22883.553  1029.804
D_2  10449.603 -3499.095
E_1 -11744.748  2139.385
E_2 -22230.711  1191.206

Thanks

r • 746 views
ADD COMMENT
0
Entering edit mode

Why are these values considered to high to plot? Regardless, it's the relationship between points that is important not the numeric value. And since it's a metric scale for cmdscale you could divide all values in the output by 10000 and it should still be a valid visual representation of your data when plotted.

If you are really concerned about the values, you could also try other ordination analyses using the vegan library in R. You can implement a nonmetric MDS (metaMDS) where the numeric axes scales are meaningless.

As an aside, why are there duplicates of each number in your matrix?

ADD REPLY
0
Entering edit mode

Hi, Im sorry, there are not duplicate names. I substituted with letters to better understand. ( 1 and 2 are different days, just to show here).

The problem arises from the beginning I think, because I have aggregate singlecell rna seq samples in pseudobulks, creating this matrix:

tab[,1:10]
           Xkr4   Gm1992   Gm19938   Gm37381        Rp1      Sox17   Gm37587     Mrpl15     Lypla1      Tcea1
A_1 100.3983403 2.094996 8.0508725 0.4740132 11.7446134 50.5215017 0.8604795 541.788327 204.612026 866.087779
A_2   3.7714064 0.000000 0.0000000 0.0000000  0.0000000  2.9878470 0.0000000  20.446530  11.856091  36.950256
B_1  26.3195842 1.942903 0.4094375 0.0000000  0.0000000  9.5934363 0.0000000 246.942944 129.289099 470.376530
B_2   4.5693646 1.343612 2.4839935 0.0000000  1.9287930  1.3162788 0.0000000 131.049776  58.809368 202.904801
C_1   3.7508510 1.285488 1.2854877 0.0000000  0.0000000  1.7040939 0.0000000   6.820192   1.231842   8.273079
C_2  65.0411039 5.427881 3.6083771 0.0000000  0.6973232  2.9113581 0.0000000 391.903116 166.712614 553.954149
D_1   0.4635404 0.000000 0.0000000 0.0000000  0.0000000  0.0000000 0.0000000   3.897851   3.142519  10.284732
D_2  21.5049262 1.197321 3.1446334 0.0000000  0.4733959  0.7565153 0.0000000 229.970330 104.761129 346.445247
E_1  66.9866197 2.038161 5.2060244 0.0000000  6.3575404 28.5580183 1.3065041  91.232942  42.103075 127.664432
E_2   0.4813143 0.000000 0.0000000 0.0000000  0.0000000  5.1772589 0.0000000  18.813332   8.889335  13.623546
> 

I have more than 25000 genes, here I only show to u ten. As u can see, expression profile in each sample is different, because the number of cells for each sample is different:

NUMBER OF CELLS:

A_1  A_2  B_1  B_2  C_1  C_2  D_1  D_2  E_1  E_2 
1322   56  733  416   16 1004   19  637  226   30 

I need so to normalize them. Can u suggest a way to do it prior to calculate distance matrix?

ADD REPLY

Login before adding your answer.

Traffic: 2681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6