Should the expression data be centered before PCA?
1
0
Entering edit mode
5.3 years ago
Raheleh ▴ 260

I have expression profile of 27 samples (10 normal, 17 tumors). I did PCA analysis to check the quality of the samples to see them clustered in 2 different groups. I did it in two ways after RMA normalization and log2 transformation. 1. I did it without centring the data and the result showed that the quality is not good.pc <- prcomp(exp)

Without Centering

  1. For the second time I centred the data (mean subtraction)

    exp.scale <- t(scale(t(exp), scale = F))
    pc <- prcomp(exp.scale)

Centered data

My question is, if the PCA is calculated from the covariance matrix and mean centering does not affect the covariance matrix; why the output is different? After centering, the quality of the samples is much better! Should I consider that the quality of my samples are good or not?

Thanks!

PCA Centering expression data • 4.6k views
ADD COMMENT
1
Entering edit mode

You did not transpose your matrix prior to running prcomp. In a gene expression matrix with rows = genes and columns = samples one would run PCA like prcomp(t(data)), see e.g. the source code of DESeq2::plotPCA. Re-run the PCA using the log2-normalized intensity values and see how it performs.

getMethod("plotPCA","DESeqTransform") Method Definition:

function (object, ...) 
{
    .local <- function (object, intgroup = "condition", ntop = 500, 
        returnData = FALSE) 
    {
        rv <- rowVars(assay(object))
        select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, 
            length(rv)))]
        ## => HERE <= ## 
        pca <- prcomp(t(assay(object)[select, ]))
        (... and so on ...)

For the future please see How to add images to a Biostars post. You need the pull path to the image.

ADD REPLY
0
Entering edit mode

Could you please fix the images ?

ADD REPLY
2
Entering edit mode
5.3 years ago

In your first example, you ARE centering your data, as the default of prcomp() is to center your data. Look up the function docs: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/prcomp.html

In your second example, you are actually centering the data twice, but one is by row (due to your transpose), while the other column.

I am not sure how you can say that your data's quality is 'not good' from the first bi-plot(?) - how are you defining 'not good'?

ADD COMMENT
0
Entering edit mode

@Kevin

I have PCA data , genes in rows and samples in column

ExpAccession    ExpA 1 N    ExpA 2 N    ExpA 3 N    ExpA 4 N    ExpA 5 N    ExpA 1 T    ExpA 2 T    ExpA 3 T    ExpA 4 T    ExpA 5 T
P10645  -1.80   -2.31   -2.12   -1.99   -1.92   -1.98   -0.57   -0.99   -0.48   2.62
P31327  -1.57   -1.90   -1.98   -1.79   -1.71   -0.02   -0.67   -0.86   -1.60   2.53
Q9BYZ8  -1.08   -1.80   -1.62   -2.07   -1.51   -1.72   -0.40   0.57    -1.52   2.48
O43745  -2.59   -2.02   -2.65   -1.39   -1.68   1.00    -1.44   -0.78   -1.81   2.46
Q99795  -1.68   -2.15   -2.40   -2.08   -2.64   0.45    -0.48   -0.32   -1.46   2.42
Q02817  -1.03   -1.47   -1.19   -1.35   -1.31   -1.38   -0.49   0.10    -1.21   2.38

I read your tutorial https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/doc/PCAtools.html

You start with class of pca

How I can adapt my matrix to your class (p) to carry on with your tutorial on plotting PCA?

Thank you

ADD REPLY
1
Entering edit mode

Hey, it looks like the first column of your data comprises the accession names. So, you will have to set those as the rownames.

Essentially, this may work:

rownames(data) <- data$ExpAccession # assign rownames
data <- data[,-1] # remove the first column

# perform PCA
require(PCAtools)
p <- pca(x, metadata = NULL, removeVar = 0.1)
ADD REPLY
0
Entering edit mode

Thank you so much

In some part I am getting this error

> plotloadings(p,
+              rangeRetain = 0.01,
+              labSize = 3.0,
+              title = 'Loadings plot',
+              subtitle = 'PC1, PC2',
+              caption = 'Top 1% variables',
+              shape = 24,
+              col = c('limegreen', 'black', 'red3'),
+              drawConnectors = TRUE)
-- variables retained:
Q6UX53, P62684, Q06141, Q9TNN7, O00425, P01911, P12104, Q92968, P01889, Q9P2F6, P30443
Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto),  : 
  Viewport has zero dimension(s)
>
ADD REPLY
1
Entering edit mode

How many samples are in your data? Were you able to produce the bi-plot and SCREE plot?

ADD REPLY
0
Entering edit mode

Thank you 10 samples I have and I have produced both bi-plot and SCREE plots

ADD REPLY
1
Entering edit mode

hmm... I am not sure.

Can you add the parameter:

components = getComponents(p, 1:3)

?

Also, try to change the value of rangeRetain

ADD REPLY
0
Entering edit mode

Sorry Kevin

Is it possible to have only 2 colors in PCA plot? For example grey for control and black for treatment?

ADD REPLY
0
Entering edit mode

The bi-plot? How have you created your pca object and which metadata do you have?

ADD REPLY
0
Entering edit mode

Thank you I don't have metadata

If this is my data

ExpAccession    ExpA 1 N    ExpA 2 N    ExpA 3 N    ExpA 4 N    ExpA 5 N    ExpA 1 T    ExpA 2 T    ExpA 3 T    ExpA 4 T    ExpA 5 T
P10645  -1.80   -2.31   -2.12   -1.99   -1.92   -1.98   -0.57   -0.99   -0.48   2.62
P31327  -1.57   -1.90   -1.98   -1.79   -1.71   -0.02   -0.67   -0.86   -1.60   2.53
Q9BYZ8  -1.08   -1.80   -1.62   -2.07   -1.51   -1.72   -0.40   0.57    -1.52   2.48
O43745  -2.59   -2.02   -2.65   -1.39   -1.68   1.00    -1.44   -0.78   -1.81   2.46
Q99795  -1.68   -2.15   -2.40   -2.08   -2.64   0.45    -0.48   -0.32   -1.46   2.42
Q02817  -1.03   -1.47   -1.19   -1.35   -1.31   -1.38   -0.49   0.10    -1.21   2.38

rownames(data) <- data$ExpAccession # assign rownames
data <- data[,-1] # remove the first column

# perform PCA
require(PCAtools)
p <- pca(x, metadata = NULL, removeVar = 0.1)

 biplot(p)
ADD REPLY
1
Entering edit mode

You will have to create metadata that has the same rownames as the colnames of x, and then assign it to metadata when you run pca():

p <- pca(x, metadata = myMetaData, removeVar = 0.1)

Then, to assign colours, you use colBy with biplot(). Examples HERE.

ADD REPLY
0
Entering edit mode

Sorry Kevin

By this code I have a biplot, I am saving that with width 5100 and length 2000 and as you are seeing axis labels and title are too small

How I can make them bigger please?

p2=biplot(pb,
                       colby = 'condition',colkey = c('N'='grey75', 'T'='black','T(scar)'='grey50'),
                       legendPosition = 'right',    hline = 0, vline = c(-25, 0,25),
                       vlineType = c('dotdash', 'solid', 'dashed'),
                       gridlines.major = FALSE, gridlines.minor = FALSE,
                       pointSize = 10, legendLabSize = 20, legendIconSize =10,
                       drawConnectors = FALSE,
                       title = 'PCA plot for expriment B (patients 6-10, N=normal esophagus, T=tumour)', labSize = 15)

 p1=biplot(pa,
                       colby = 'condition',colkey = c('N'='grey75', 'T'='black'),
                       legendPosition = 'right',    hline = 0, vline = c(-25, 0,25),
                       vlineType = c('dotdash', 'solid', 'dashed'),
                       gridlines.major = FALSE, gridlines.minor = FALSE,
                       pointSize =10, legendLabSize = 20, legendIconSize = 10,
                       drawConnectors = FALSE,
                       title = 'PCA plot for expriment A (patients 1-5, N=normal esophagus, T=tumour)', labSize = 15)

 plot_grid(p1,p2,label_size = 10)

enter image description here

ADD REPLY
1
Entering edit mode

Hey, the labels do not usually come out that small... If you run ?biplot in R, you will see all of the options for the biplot() function. The ones in which you will be interested are:

biplot(
  ...,
  axisLabSize = 16,
  title = '',
  subtitle = '',
  caption = '',
  titleLabSize = 16,
  subtitleLabSize = 12,
  captionLabSize = 12,
  ..)
ADD REPLY
0
Entering edit mode

Sorry @Kevin

Really thank you very much for PCAtools package

You think I better use all of my data from RNA-seq for PCA biplot or using differentially expressed genes between tumour and normal tissue would make a better view of data?

ADD REPLY
0
Entering edit mode

It depends on what your aim is for using PCAtools? Usually, people use the entire dataset.

ADD REPLY
0
Entering edit mode

We treated some patients with chemotherapy, some patients responded and some not. So in PCA responding patients would group with normal samples and from their tumour only a scar remains. The goal is to show responding patients being grouped with adjacent normal samples

ADD REPLY
1
Entering edit mode

You can try 2 things:

  1. PCAtools using entire dataset - this shows any 'natural' grouping based on the entire transcriptome
  2. hierarchical clustering + heatmap using differentially expressed genes - this shows the ability of the differentially expressed genes to distinguish the groups
ADD REPLY
0
Entering edit mode

Hi Kevin. Thanks for the point!

I defined it as "not good" because in the first figure the samples from two different groups are not separated from each other as good as in the second figure. Isn't like this? If I am wrong please correct me. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2313 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6