Question

I Need Some Help To Understand How To Interpret Heatmaps And Pca'S Please.

0

Entering edit mode

13.6 years ago

proxify ▴ 40

I am doing my thesis and as part of it I've begun to learn about the R programing language. However I am having some trouble understanding some of the graphs I am given. I already tried reading about them in the R help documentation as well as a bit online but I still don't quite grasp it. Could anyone please give me a "for dummies" sort of explanation?

For example in this graphs I have:

http://i.imgur.com/ztaIP.png

http://i.imgur.com/HfW7K.png

http://i.imgur.com/5M7L2.png

http://i.imgur.com/rCvVh.png

I distinguish them and I know how to get them but I don't get what they're trying to tell me. This was a GALGO library run (built in the R language) of a database of cancer (propense or relapsing) genes. Could someone help me out? or maybe I need to post more info to get help if so, let me know.

heatmap pca graphs • 7.6k views

ADD COMMENT • link updated 13.6 years ago by Julien Textoris ▴ 430 • written 13.6 years ago by proxify ▴ 40

5

Entering edit mode

It's nearly impossible to interpret these graphs out of context. They are only meaningful in the context of the article they appear in. So your first task is to get the publication, then you have to read the article and legends. However, asking this before you tried to make this attempt for four graphics at once is not very good. Please try to make your question more concrete.

ADD REPLY • link 13.6 years ago by Michael 56k

score 1 · Answer 1 · 2012-04-13

I think you should do the other way. First ask one or several questions, and then figure out which analysis and which graphical representation best suits your needs.

When using Pca, you try to decompose the variance in your data into principal components. Principal components may or may not be related to a variable in your design, or to any technical bias you may have identified. It is an exploratory tool.

In your third example (5M7l2), i assume dots are samples. These samples are plotted in various coordinate systems according to the selected principal components. As it is an unsupervised analysis method, it does not use any design of your experience in the analysis. If for example you have samples from cancer and healthy tissue, you have two groups. You can use this variable to color the dots, ie the samples, according to this variable. In your example, you can see that using PC1 and PC3, your samples seem to be separated into two groups.

Usually, the first principal component explains a big part of the variance, the second explains smaller part, the third an even smaller part and so on. If PC3 seems to be related to your design (Cancer vs Healthy), this means that there are other variables that are explaining the variance in gene expression. Imagine once that you have the information about the sex of each individual the samples come from, you can draw the same plot, but now you may color the dots according to another variable : sex. Remember, i said the Pca analysis does not care of your design, it identifies PCs and then plot your samples (or your genes) into various coordinate systems based on these PCs. So whether you choose cancer variable or sex variable won't change the position of the dots/samples in the plot, it will only modify their colors. With sex as a variable, you may see clear separation of your samples with PC1 and PC2. What does it means : it means that sex has a greater influence on gene expression in your data than cancer does. So sex is a confounding factor in your analysis. If you want to identify the genes that have an expression related to cancer, you will have to adjust for sex.

So to resume, when you start your analysis, you explore your data to identify potential confounding variables (that's why you have to build first a phenodata file, which described your samples, the most complete you can, including technical potential confounding factors), and then you will go on with supervised analysis, knowing which variables you will have to adjust for.Sex

Hope it helps, i am writing on a tablet so I'll post something on heatmap once i turn on my computer :-)