Hi,
I am working with the qRT-PCR log2FC data (96 gene on rows * samples on columns) containing healthy controls and patients treated with different stimulations. I am using this log2FC data.frame along with the sample metadata in the PCAtools
for plotting the PCA biplot.
p <- pca(log2FC.df, metadata = Sample_metadata, center = TRUE, scale = FALSE, removeVar = 0.1)
-- removing the lower 10% of variables based on variance
I believe I can extract the genes from p$loadings
in pcatools
which is similar to p$rotation
in prcomp
will output the components which contributes to the strongest PCs. There are 96 PCs all together, and 80 genes in the row. I am only interested in extracting genes in PC1, and PC2 (largest %), but all remaining PCs (3,......, 96) also shows the same genes. I am bit confused about this. Should PC1 and PC2 loadings should be sorted and extracted? Additionally, I would like to re-plot the PCA using these PC1 and PC2 loading metrics, does it makes sense or should I extract or subset the original log2FC data.frame corresponding to these 80 genes, and then re-plot the PCA?
p$loadings[,c("PC1", "PC2")]
dim(p$loadings[,c("PC1", "PC2")])
PC1.2 <- as.data.frame(p$loadings[,c("PC1", "PC2")])
Thank you,
Toufiq
I am the PCAtools main developer. What is it that you would like to do? The variable / component loadings give a value that is unitless but that represents the strength of each gene / protein / variable to each PC.
Kevin Blighe thank you for the prompt reply. I would like to extract the highly contributing genes from PC1 and PC2 and replot the 2D PCA or scatter plot.
I would first identify the top 10, 20, or 50 genes based on component loading (absolute values), then filter your input data for these, and then re-perform PCA. I am not sure that this procedure is standard though. What are you hoping to achieve?
agree - for this you would just need to sort by the PC loading for a given PC then take top few. But for what purpose? it might be that there is a better suggestion depending on goal
Kevin Blighe and Vincent Laufer
Thank you. I am interested in extracting highly variance genes and plotting the data. My log2FC data.matrix contains total of 96 genes, hence there was a scattered distributions of stimulations conditions. I thought of extracting top genes contributing to PC1 and PC2, and then re-plot the data with these genes.
Extract.Features.PCA
Plot heatmap
Plot PCA