PCA - untreated sample cluster separated
0
0
Entering edit mode
3.9 years ago
camillab. ▴ 160

Hi!

I performed a PCA on 6 chick samples (bulk RNAseq, 3 treated vs 3 untreated) and I found 2 samples (one per condition) that cluster far apart from the other so I thought that they were the outliers. I then remove them and I re-did the PCA on the 4 samples and one of them does not cluster together with the other control. So what am I supposed to do at this stage? is there any additional test that I can do to confirm/ check for the outliers?

(I am not sure how I can justify the fact that different untreated samples do not cluster together despite the animal model are kept under the same condition and tissues processed in the same way from the same person whereas the treated samples do cluster together)

thank you for the suggestion

Camilla

R bulk-RNAseq PCA variability • 1.3k views
ADD COMMENT
0
Entering edit mode

Can you include a picture of your PCA plot?

ADD REPLY
0
Entering edit mode

I performed a PCA... - may I ask what steps you did to get to that stage? You did not provide much details.

ADD REPLY
0
Entering edit mode

Apologies! here the code and below the image of the 1st PCA plot (not sure if you can see it):

library(tidyr)
#tidy
chickSE <- na.omit(chick)
chickSE <-  chickSE[,-(8:12)] 
rnames <- chickSE$gene_symbol #generate a matrix with rownames
chickSE <- chickSE[,-(1:3)] #remove column name
data2 <- as.matrix(chickSE) #generate a matrix with only samples
rownames(data2) <- rnames #associate rownames to the matrix
data2 <- log(data2 + 1) #log transform
chick.t<-(t(data2)) #transpose!

#compute pca
PCA<-prcomp(chick.t, scale=F)

PCA

ADD REPLY
0
Entering edit mode

Can you also include the results of summary(PCA)? This will give information on the proportion of variance explained by each PC.

ADD REPLY
0
Entering edit mode
Importance of components:
                           PC1     PC2    PC3   PC4
Standard deviation     24.0447 18.7620 8.8427 6.485
Proportion of Variance  0.5361  0.3264 0.0725 0.039
Cumulative Proportion   0.5361  0.8625 0.9350 0.974
                           PC5       PC6
Standard deviation     5.29932 3.293e-14
Proportion of Variance 0.02604 0.000e+00
Cumulative Proportion  1.00000 1.000e+00
ADD REPLY
0
Entering edit mode

PC1 explains 54% of variance in your data, and separates your conditions, which is a good sign. However, PC2 explains 34% of variance and separates a sample of each type from the other two samples of the same type, which may cause problems. When these samples were being collected, was there anything different about those two samples compared to the other ones, such as being collected a different day?

ADD REPLY
0
Entering edit mode

What I do not understand is, if I remove those 2 samples and re-compute the PCA (same code as above but removing the column with those samples) I got this plot(not sure if you can see it) with one control one the top left side and the other one on the bottom left side (variance explained: PC1 87.78%; and PC2 7.66%). How should I interpret that? Did I miss something?

enter image description here

ADD REPLY
1
Entering edit mode

It's not even clear yet whether you should be removing those two samples, but in the above plot the conditions are separated on the x-axis (PC1), which explains almost 88% of the variance in your data, ten times more than the separation on the y-axis (PC2). This is roughly what you would expect, since you want the between condition variance to represent most of the variance in the data.

ADD REPLY

Login before adding your answer.

Traffic: 2267 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6