Acceptability of R dropping zeros from logged RNAseq RPKM data
1
0
Entering edit mode
9.3 years ago
ad ▴ 30

I have several groups of RNAseq data that I'm trying to compare to each other through ggplot in R. It consists of several columns of RPKM data each column a different group of samples. i.e., column 1: gene1 RPKMs in normal. Column 2:gene 1 RPKMs in tumor etc.

For example using a small excerpt of data

library(ggplot2)

df = read.table(text="G1 G1.1 G1.2 G1.3 G2 G2.1 G2.2 G2.3
     1    0   3    4    3   2    3    1
     2    'NA'   5    5    5   2    1    2
     2     'NA'   2    1    2   1    2    5", header=TRUE)

dfmelt<-melt(df)
ggplot(dfmelt, aes(variable, value, fill=variable)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=90))+
  scale_x_discrete(labels=c('C1','C2','C3','C4','C5','C6','C7','C8'))+
  scale_fill_manual(values=rep(c("red","green","blue","yellow"),2))+
  stat_summary(fun.y = median, geom = "point", position =     position_dodge(width = .9))+
  scale_y_log10()

The problem occurs when I attempt to do boxplots of the data in ggplot2 and have it on a log10 y scale. Necessary due to the data distribution. Ggplot appears to simply drop zero values with the message

Removed x rows containing non-finite values (stat_boxplot)
Removed x rows containing missing values (stat_summary)

From what I've read ggplot attempts to take the log of 0 and comes up with -Inf so it drops it. Is this of concern in RNAseq expression analysis? If so how do I best handle it to get what I want without distorting the data?

RNA-Seq expression R • 2.6k views
ADD COMMENT
0
Entering edit mode

just add a small number to all. Like 1

ADD REPLY
0
Entering edit mode

just add a small number to all. Like 1

ADD REPLY
0
Entering edit mode
9.3 years ago
JC 13k

RPKM generally produces a lot of zero-values, IMO it's better to use other metric such as CPM or CPK.

Related:

ADD COMMENT

Login before adding your answer.

Traffic: 2056 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6