i have the genes in rows and the sample names in the columns and number samples are 76 and number of genes are 376. i got these genes after differential gene expression of different biotic and abiotic stress conditions, i want to do a PCA analysis in R and biplot graph for my data. can any one help ?
Personally I don't recommend 3D PCA plot like that in your referred paper. It can be sometimes confusing to interpret, see an example here(start at 15:30). Plot PC1 vs PC2, PC2 vs PC3 would be much clearer to see the pattern.
Completely agree! If the plot is interactive and you can rotate the axis then 3d plots can be somewhat useful to understand the structure of your data (although still not so easy- although I guess it dependes on the data). But 2d snapshots of a 3d plot can be very misleading. Thanks for the very useful link to R. Irizarry's talk.
First, you have to be clear , what you want to see. PCA on entire samples means based on gene variability you see them clustered in 2 different groups. This marks the difference between the conditions or the groups you study. If you have already found your genes that are DEGs, it is advisable to use them as a volcano plot or MAplot in r to capture their difference or even a heatmap with some wonderful r packages.
It is not very much advisable to make PCA on the DEGs, better to make a heatmap on them. But if you are hell bent on doing a PCA then MDSplot from limma or prcomp or princomp will also suffice. But ideally, what you want to convey is based on variability of gene expression between 2 conditions you have come up with the highest variable genes that separate them in 2 clusters thus giving different phenotypes. This is fairly simple. You take all samples, perform PCA on all samples vs all genes, you see they have 2 clusters and samples show variability, so down stream of it you perform DE analysis to find those genes. This can be seen either in MAplot or volcano plot or a heat map. PCA for such a small number of samples and genes is not appreciated. I would bet that in this case, your PCA should be on genes rather than samples. So points you will project in the PC should be the genes separated by 2 conditions of your samples.
In R you can use the function prcomp() (available by default) on your matrix. Then you can use biplot() on the result to obtain a biplot (read the documentation about biplot with ?biplot as there are different kinds of plots that are known as biplot). Another alternative is to install the pcaMethods Bioconductor package. A small example with prcomp():
x <- data.matrix(iris[,-5]) # prcomp() requires a numeric matrix.
p <- prcomp(x)
p
Standard deviations:
[1] 2.0562689 0.4926162 0.2796596 0.1543862
Rotation:
PC1 PC2 PC3 PC4
Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574
biplot(p)
Not that function AFAIK. However, take a look at the answer to this question in SO. Also a quick search points to an R package called pca3d, which uses rgl for 3D pca plots with interactivity.
Dear rajasekargutha, Hi.
There is a PCA performing script using PtR at the bottom of this page and this post.
~ Best
http://www.plantphysiol.org/content/164/1/481/F2.expansion.html
i want to do like in this paper
Personally I don't recommend 3D PCA plot like that in your referred paper. It can be sometimes confusing to interpret, see an example here(start at 15:30). Plot PC1 vs PC2, PC2 vs PC3 would be much clearer to see the pattern.
Completely agree! If the plot is interactive and you can rotate the axis then 3d plots can be somewhat useful to understand the structure of your data (although still not so easy- although I guess it dependes on the data). But 2d snapshots of a 3d plot can be very misleading. Thanks for the very useful link to R. Irizarry's talk.
I would recomment do perform a Multidimensional plot instead of a PCA, See cmdscale in R help.