Plot heatmap

Question

Extraction of high variance or contributing genes using pcatools

0

Entering edit mode

2.6 years ago

mohammedtoufiq91 ▴ 260

Hi,

I am working with the qRT-PCR log2FC data (96 gene on rows * samples on columns) containing healthy controls and patients treated with different stimulations. I am using this log2FC data.frame along with the sample metadata in the PCAtools for plotting the PCA biplot.

p <- pca(log2FC.df, metadata = Sample_metadata, center = TRUE, scale = FALSE, removeVar = 0.1)
-- removing the lower 10% of variables based on variance

I believe I can extract the genes from p$loadings in pcatools which is similar to p$rotation in prcomp will output the components which contributes to the strongest PCs. There are 96 PCs all together, and 80 genes in the row. I am only interested in extracting genes in PC1, and PC2 (largest %), but all remaining PCs (3,......, 96) also shows the same genes. I am bit confused about this. Should PC1 and PC2 loadings should be sorted and extracted? Additionally, I would like to re-plot the PCA using these PC1 and PC2 loading metrics, does it makes sense or should I extract or subset the original log2FC data.frame corresponding to these 80 genes, and then re-plot the PCA?

p$loadings[,c("PC1", "PC2")]

dim(p$loadings[,c("PC1", "PC2")])

PC1.2 <- as.data.frame(p$loadings[,c("PC1", "PC2")])

Thank you,

Toufiq

PCAtools FactoMineR R prcomp PCA • 1.6k views

ADD COMMENT • link 2.6 years ago by mohammedtoufiq91 ▴ 260

1

Entering edit mode

I am the PCAtools main developer. What is it that you would like to do? The variable / component loadings give a value that is unitless but that represents the strength of each gene / protein / variable to each PC.

ADD REPLY • link 2.6 years ago by Kevin Blighe 88k

0

Entering edit mode

Kevin Blighe thank you for the prompt reply. I would like to extract the highly contributing genes from PC1 and PC2 and replot the 2D PCA or scatter plot.

ADD REPLY • link 2.6 years ago by mohammedtoufiq91 ▴ 260

2

Entering edit mode

I would first identify the top 10, 20, or 50 genes based on component loading (absolute values), then filter your input data for these, and then re-perform PCA. I am not sure that this procedure is standard though. What are you hoping to achieve?

ADD REPLY • link 2.6 years ago by Kevin Blighe 88k

2

Entering edit mode

agree - for this you would just need to sort by the PC loading for a given PC then take top few. But for what purpose? it might be that there is a better suggestion depending on goal

ADD REPLY • link 2.6 years ago by LauferVA 4.5k

0

Entering edit mode

Kevin Blighe and Vincent Laufer

Thank you. I am interested in extracting highly variance genes and plotting the data. My log2FC data.matrix contains total of 96 genes, hence there was a scattered distributions of stimulations conditions. I thought of extracting top genes contributing to PC1 and PC2, and then re-plot the data with these genes.

Extract.Features.PCA

Extract.Features.PCA <- as.data.frame(rownames(p$loadings[c(1:50),c("PC1", "PC2")]))
names(Extract.Features.PCA) <- c("Gene_Symbols")
names(Extract.Features.PCA)
rownames(Extract.Features.PCA) <- Extract.Features.PCA$Gene_Symbols

Plot heatmap

PC1.PC2 <- log2FC[rownames(Extract.Features.PCA), ]
library(ComplexHeatmap)
Heatmap(PC1.PC2)

Plot PCA

p_PC1.PC2 <- pca(PC1.PC2, metadata = Sample_metadata, center = TRUE, scale = FALSE, removeVar = 0.1)

biplot(p_PC1.PC2,
       x = 'PC1', y = 'PC2',
       lab = NULL,
       colby = 'Stim', colkey = c("Stim 1" = "#4FF300", "Stim 2" = "#FFEE07",  "Stim 3" = "#000000"),
       legendPosition = 'right', legendLabSize = 13, legendIconSize = 3.0,
       shape = 'Subject', shapekey = c('A' = 8, "B" = 15, "C" = 17, "D" = 18),
       subtitle = 'PC1 vs. PC2')

ADD REPLY • link 2.6 years ago by mohammedtoufiq91 ▴ 260