Question

PCA of expression data what PC to use?

0

Entering edit mode

9.5 years ago

Floris Brenk ★ 1.0k

Hi all,

I'm looking for covariates in my expression dataset (countdata CAGEseq). I have 120 samples and about 20.000 expression values. I found this tutorial that is really easy to use:

https://tgmstat.wordpress.com/2013/11/28/computing-and-visualizing-pca-in-r/

So I changed it for my data with 120 samples columns and 20.000 rows.

pca_data = t(log(norm.data+1))
dim(pca_data)
[1]   120 20000
cage.pca <- prcomp(pca_data,
                 center = TRUE,
                 scale. = TRUE) 

# plot method
plot(cage.pca, type = "l")

# summary method
summary(cage.pca)

Look like to me if I use the first 6 PCAs then most of the variation is gone.

however when I do the summary method it gives out 119 PCA components? I am a bit confused now and don't know which PCA components I need to use as covariates. And the cumulative proportion of PCA6 is 0.43128 not like in the plot where I would expect a lot more... Could anyone help with this?

R PCA expression • 2.4k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.5 years ago by Floris Brenk ★ 1.0k

1

Entering edit mode

you don't need to use 6 principal components. from the scree plot you can see that 3 capture most of the variability, adding 3 more don't really add that much.

ADD REPLY • link 9.5 years ago by TriS ★ 4.8k

0

Entering edit mode

Ok thanks. For extracting these components I can just do this right?

first = cage.pca$x[,1]
second = cage.pca$x[,2]
third = cage.pca$x[,3]

ADD REPLY • link 9.5 years ago by Floris Brenk ★ 1.0k

1

Entering edit mode

Were you concerned because plot() only showed 10 components and summary() showed 119 ? The reason is that by default, plot() shows at most 10 components. So although it shows that the first 3-6 components explain a large amount of variance, it is a bit misleading because a lot of the variance is also captured in the components not shown, summary() shows the cumulative variance explained and tells you that the first 6 components only explain ~43% of the variance.

ADD REPLY • link 9.5 years ago by Jean-Karim Heriche 27k