How to generate a 2D PCA plot from bulk RNA-seq data (log2 CPM) using the PCAtools?
1
1
Entering edit mode
3.1 years ago
vHelmholtz ▴ 40

Hi all,

I have bulk RNA-seq data with 12 samples - WT (x4), 'A' KO (x4), and 'B' KO (x4). I want to generate a 2D PCA plot (biplot) like below figure to look at the relationship between the samples.

a 2D PCA plot

I have tried an R package, 'PCAtools,' but it looks not work correctly as below. the PCA plot generated by the code below

I have pasted my code and data below. I will very much appreciate it if you share any advice or suggestions.

Thanks in advance!

Joshua

library(PCAtools)
data <-read.csv("C:/.../all,gene_log2cpm_revised.csv", fileEncoding = 'UTF-8-BOM')

groups = c(rep("WT", 4), rep("A-KO", 4), rep("B-KO", 4))
cols = c('red', 'green', 'blue')[factor(groups)]

data$gene_name = as.numeric(as.factor(data$gene_name))

pca = prcomp(data)
pca$x
pca$sdev
biplot(pca, cex=0.7, scale=T, xlim=c(-0.6,+0.6))

Data file format:

Data file format top genes

RNA-seq PCA • 2.5k views
ADD COMMENT
4
Entering edit mode
3.1 years ago

Hi Joshua,

Your second plot is not produced by my package, PCAtools. It is produced via base R functions.

Can you take a look through my vignette, please? - https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/doc/PCAtools.html#quick-start-deseq2

You probably need something like:

library(PCAtools)
data <- read.csv(
  'C:/.../all,gene_log2cpm_revised.csv',
  fileEncoding = 'UTF-8-BOM',
  row.names = 1)

groups = c(rep("WT", 4), rep("A-KO", 4), rep("B-KO", 4))
cols = c('red', 'green', 'blue')[factor(groups)]

metadata <- data.frame(groups = groups)
rownames(metadata) <- colnames(data)
p <- pca(data, metadata = metadata)

biplot(p, colby = 'groups')

Kevin

ADD COMMENT
1
Entering edit mode

I deleted the gene_name column from the CSV file, upload the file, and generated a PCA biplot as below.

Thanks, Kevin! You helped me a lot.

-Joshua

enter image description here

ADD REPLY
0
Entering edit mode

Thank you so much for your help, Kevin!

I found that the code below does not work. It seems that is because the number of columns (13, including the first column of the data file, 'gene_name') does not match that of rows (12)

rownames(metadata) <- colnames(data)

-> Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Could you help me with how to exclude the 'gene_name' column from the PCA analysis?

-Joshua

ADD REPLY

Login before adding your answer.

Traffic: 2682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6