Question

How to interpret the Principal Component Analysis (PCA) results?

4

Entering edit mode

8.8 years ago

orzech_mag ▴ 230

Dear Colleagues,

I performed some type of PCA analysis, which is called Multiple Factor Analysis (MFA). In general, it is defined similarly to PCA with one difference comprising the use of categorical or mixed data instead of continuous only. However, both PCA and MFA results are very mysterious as I could not find any source that would explain the meaning of "dimensions". I read many manuals etc, and I found that interpretation of x and y axis (called the dimensions) is very elusive (or am I wrong?). They present mathematics and general principle of the method, but it is not sufficient for me. I.e. I was trying to crack the sample dataset with descriptions, which the authors provided. The data regarded odor of wines before and after shaking (it is referred here: http://factominer.free.fr/advanced-methods/multiple-factor-analysis.html). What I cannot understand is how they deduced that x - axis means "intensity" and "harmony"? Therefore I cannot refer it to my data and my results.

Please, give me anything in simple terms or any source that could me more useful.

Thanks in advance.

PCA MFA FactoMineR R data interpretation • 11k views

ADD COMMENT • link updated 8.8 years ago by Carlo Yague 9.0k • written 8.8 years ago by orzech_mag ▴ 230

score 1 · Answer 1 · 2016-10-03

1

Entering edit mode

8.8 years ago

andrew.j.skelton73 6.6k

I quite like this explanation

ADD COMMENT • link 8.8 years ago by andrew.j.skelton73 6.6k

score 1 · Answer 2 · 2016-10-03

1

Entering edit mode

8.8 years ago

Carlo Yague 9.0k

The interpretation of the axis comes from the analysis of this figure : variable graph

Where the original variables are represented in the Dimension1-Dimension2 space. The D1 can be intepreted as the resultant of all the variable projected on the x-axis. The longer the projected vector is, the more important is the contribution of the variable in the dimension. In this case, the x-axis (D1) is dominated by the idea of intensity (intensity, visual.intensity, alcohol, attack.intensity, aroma.intensity, ...) and to a lesser extent, by the idea of harmony.

In contrast, the y-axis (D2) is defined by the taste and oppose a spicy, herbal taste to a flowerly, fruity taste.

Of course this is a very crude interpretation but it allows to summarize many variables. A possible application would be to rank the wines according to the D1 with the highest scoring wines being probably the best (at least in term of intensity and harmony).

ADD COMMENT • link 8.8 years ago by Carlo Yague 9.0k

1

Entering edit mode

Thank you very much guys. Now, I get this method more, but still the interpretation remains difficult as it is not black or white, like other statistics. My data are quite complicated, so the understanding of the dimensions is challenging.

Cheers.

ADD REPLY • link 8.8 years ago by orzech_mag ▴ 230

0

Entering edit mode

Yes, it is not always easy to understand what's going on in your data. Note that PCA is very sensitive to outliers so log-transformation (or square root transformation) of variables with long tail distribution (such as count data usually) can help in finding patterns.

ADD REPLY • link 8.8 years ago by Carlo Yague 9.0k

0

Entering edit mode

Thank you. I will consider additional data transformation, nevertheless the results are quite good.

Now I am facing another difficulty, which is the plotting of my results. In more details, I would like to plot 3D graph. I saw the rgl package that plots the results of PCA, however it is hard to use on my data (which are mixed, categorical with continuous). As I mentioned before, I performed MFA (Multiple Factor Analysis from FactoMineR package). My input data was continuous: expression of particular genes in patients group, and categorical: y-if patient experienced relapse or n-if not. Therefore, I obtained the results divided into two groups: 1) with relapse and 2) with no relapse. The example looks like: enter image description here Now, I would like to see, how does it look in 3D, so I wanted to add the third dimension data from analysis to my graph. But, factoextra cannot do this (as it just draws two dimensions) and rgl package cannot do it as well, because it just plots continuous data with no regards to relapse. Is there any solution of this issue? I mean any package that I didn't saw, or just another way to present it.

Thanks in advance.

ADD REPLY • link 8.8 years ago by orzech_mag ▴ 230

0

Entering edit mode

Sorry, I don't know much about 3D plotting. But be aware that when you plot something in 3D, you create a visual distortion making it impossible for the reader to assess the exact coordinates of the points. And in 2D, you can always plot Dimension 3 vs Dimension 1 or 2.

ADD REPLY • link 8.8 years ago by Carlo Yague 9.0k