Hi all,
I have bulk RNA-seq data with 12 samples - WT (x4), 'A' KO (x4), and 'B' KO (x4). I want to generate a 2D PCA plot (biplot) like below figure to look at the relationship between the samples.
I have tried an R package, 'PCAtools,' but it looks not work correctly as below.
I have pasted my code and data below. I will very much appreciate it if you share any advice or suggestions.
Thanks in advance!
Joshua
library(PCAtools)
data <-read.csv("C:/.../all,gene_log2cpm_revised.csv", fileEncoding = 'UTF-8-BOM')
groups = c(rep("WT", 4), rep("A-KO", 4), rep("B-KO", 4))
cols = c('red', 'green', 'blue')[factor(groups)]
data$gene_name = as.numeric(as.factor(data$gene_name))
pca = prcomp(data)
pca$x
pca$sdev
biplot(pca, cex=0.7, scale=T, xlim=c(-0.6,+0.6))
Data file format:
I deleted the gene_name column from the CSV file, upload the file, and generated a PCA biplot as below.
Thanks, Kevin! You helped me a lot.
-Joshua
Thank you so much for your help, Kevin!
I found that the code below does not work. It seems that is because the number of columns (13, including the first column of the data file, 'gene_name') does not match that of rows (12)
-> Error in
.rowNamesDF<-
(x, value = value) : invalid 'row.names' lengthCould you help me with how to exclude the 'gene_name' column from the PCA analysis?
-Joshua