Require comprehensive tutorials for PCA analysis of microarray
1
0
Entering edit mode
5.1 years ago

I've been exploring PCA analysis of a microarray experiment. I have a microarray of two genotypes of transgenic plants induced under 6 different time points. I have the following exploratory questions I would like answered with the PCA.

1) where does most of the variation lie?

2) which time points are similar and which ones differ in their effect on gene expression?

3) do the two genotypes differ under conditions of similar treatment?

I have the above questions figured out. What I am not able to analyse using PCA is the following:

4) Which gene expression values contribute the most to the observed variation?

5) How do I visualise and represent only these genes on a PCA graph?

6) How do I block out the unimportant genes which don't contribute to the observed variation?

The packages that I have explored so far are ggfortify, ggplot and ggbiplot. I don't seem to be able to find any tutorials that teach me how to answer the above three questions using these three packages.

First of all:

7) is it possible to answer such questions as in 4,5 and 6?

8) if yes, can someone point me to a tutorial which shows how this is done?

microarray R PCA • 1.1k views
ADD COMMENT
2
Entering edit mode

These would provide a good start:
PCA in a RNA seq analysis
PCA plot from read count matrix from RNA-Seq

@Kevin is active on Biostars so should answer any remaining questions.

ADD REPLY
3
Entering edit mode
5.1 years ago

I believe all of your questions can be addressed via PCAtools, released just a few months ago on Bioconductor. I developed a relatively comprehensive tutorial here: PCAtools: everything Principal Component Analysis.

6) How do I block out the unimportant genes which don't contribute to the observed variation?

You will have to elaborate on what you mean by this.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you for your response! I will elaborate along with some more questions if that's okay:

6) Is there a way to find out which are the set of genes in my microarray/RNA seq that are contributing the most to the observed variation? I think I have the answer to this from your tutorial- plotloadings is what is doing this very thing. It allows me to display only the probes/genes that show top percentage of loadings. Am I right in this? Is there a way I could get a list of these genes/probes using PCAtools? I am unclear on how to do this bit.

ADD REPLY

Login before adding your answer.

Traffic: 2809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6