PCA on FPKM data
1
0
Entering edit mode
4.5 years ago
ipalmisa ▴ 10

Hi I am running a PCA analysis using prcomp function on my FPKM data from Rnaseq analysis. I am scaling the FPKM data by using zFPKM from here (https://www.bioconductor.org/packages/release/bioc/html/zFPKM.html) The zero values in FPKM data are converted in -Infinite, so when I run prcomp funtion, I get Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x' Can you help me, please? Should I just remove the -infinite?

RNA-Seq • 2.3k views
ADD COMMENT
1
Entering edit mode
4.5 years ago
piyushjo ▴ 710

why do you want to scale? are you following some tutorial where they scale the FPKM values?

ADD COMMENT
0
Entering edit mode

I read here Question about PCA plot using RPKM/FPKM. the suggestion to scale

ADD REPLY
1
Entering edit mode

prcomp() will, by default, only center your data. You can set it to additionally scale by using:

prcomp(x, center = TRUE, scale = TRUE)

If you are having trouble with zFPKM, then just try the above.

zFPKM converts 0 values to -Inf (negative infinities):

df <- data.frame(a=c(1,4,5,6,7,4,3,0),b=c(6,5,4,23,5,6,7,78))
 df
  a  b
1 1  6
2 4  5
3 5  4
4 6 23
5 7  5
6 4  6
7 3  7
8 0 78

zFPKM::zFPKM(df)
           a           b
1 -4.0512861  0.08036657
2 -0.2796893 -0.08062816
3  0.3274022 -0.27766979
4  0.8234321  1.26691969
5  1.2428194 -0.08062816
6 -0.2796893  0.08036657
7 -1.0623663  0.21648567
8       -Inf  2.34528431
ADD REPLY
0
Entering edit mode

Yes, i have noticed that the zeros are scaled to minus infinite But when I try to use the prcomp on the FPKM data using pca=prcomp(my_data, center=TRUE, scale = TRUE) I get Error in prcomp.default(my_data, center = TRUE, scale = TRUE) : cannot rescale a constant/zero column to unit variance It works only with scale=FALSE I think here the zeros are the issue...

ADD REPLY
0
Entering edit mode

I see. You have variables that are just all 0. You need to remove these like this:

df
  a  b
1 1  6
2 4  5
3 5  4
4 0  0
5 7  5
6 4  6
7 3  7
8 0 78

keep <- !apply(df, 1, function(x) all(x == 0))

df[keep,]
  a  b
1 1  6
2 4  5
3 5  4
5 7  5
6 4  6
7 3  7
8 0 78
ADD REPLY
1
Entering edit mode

Thank you! It worked!

ADD REPLY
0
Entering edit mode

ah ok! I will remove the ones that are Zero in ALL my samples, and keep the ones that are Zero only in some samples, right? Have I understood correctly? Thanks

ADD REPLY
1
Entering edit mode

Yes, that is correct. I cannot see your data but you may also have other non-zero variables that have constant variance, but, first, try to remove just those variables that are all zero.

I also updated my original answer: A: Question about PCA plot using RPKM/FPKM.

Sorry to piyushjo for overtaking the thread.

ADD REPLY
1
Entering edit mode

It's alright. You are the pro! I was just trying to get more information but you had better answer.

ADD REPLY
0
Entering edit mode

You are pro too. Everybody has much to offer in different areas.

ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6