Hello all. I am running RNA-seq R script and my goal is to get rlogTransformed count data. But in rlogTransformed count data : some are negative values. Does anyone know why the numbers come like this? Please give me kind explanation of this.
Here is my script :
countdata <- read.table("all_count.txt", header=TRUE, row.names=1)
countdata <- as.matrix(countdata)
(condition <- factor(c(rep("a", 1), rep("b", 1), rep("c",1), rep("d",1), rep("e",1), rep("f",1), rep("g",1), rep("h",1), rep("i",1), rep("j",1), rep("k",1), rep("l",1), rep("m",1), rep("n",1), rep("o",1), rep("p",1))))
library("DESeq2")
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds <- DESeq(dds)
rld <- rlogTransformation(dds)
t1 <- assay(rlog(dds, blind = FALSE))
write.csv(t1, file = 'rld.csv')
My count data looks like this : (column names : gene_id, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)
ENSG00000223972 0 5 0 1 0 0 3 0 7 2 0 0 0 6 0 0
ENSG00000227232 738 687 817 785 862 920 616 828 718 533 338 718 563 622 241 402
ENSG00000278267 35 45 44 28 25 48 32 27 23 15 11 21 3 22 40 24
ENSG00000243485 0 0 0 2 0 0 0 0 0 2 0 0 0 0 0 5
Here is normalized count data (FPKM) : (column names : gene_id, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)
ENSG00000223972 0 4.288951195 0 0.886495015 0 0 2.634664669 0 6.740607986 1.825436668 0 0 0 5.552312428 0 0
ENSG00000227232 668.8930875 589.3018942 718.6769008 695.8985868 812.6102698 883.1227192 540.9844787 749.7218024 691.3937905 486.4788721 348.1203536 682.8221808 560.7432175 575.5897217 349.3737234 574.2096037
ENSG00000278267 31.7225719 38.60056075 38.70475353 24.82186042 23.56758323 46.07596796 28.10308981 24.44745008 22.14771195 13.69077501 11.32936062 19.97112228 2.987974516 20.3584789 57.98733999 34.28117037
ENSG00000243485 0 0 0 1.77299003 0 0 0 0 0 1.825436668 0 0 0 0 0 7.141910494
ENSG00000284332 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Here is my rlogTransformed data : (column names : gene_id, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)
ENSG00000223972 -0.136494061 0.473993602 -0.136927121 0.031100344 -0.135908179 -0.135632089 0.286572765 -0.136508589 0.692740738 0.177626545 -0.134517951 -0.135774897 -0.135056306 0.593107085 -0.128129828 -0.128442585
ENSG00000227232 9.355356576 9.209612781 9.438295885 9.401053408 9.58080676 9.677798069 9.111648865 9.487270931 9.393542263 8.990608069 8.614883247 9.379137129 9.152689426 9.182616956 8.61918245 9.17988811
ENSG00000278267 4.84784725 5.034954118 5.037459351 4.621104012 4.574230631 5.206904356 4.734912851 4.607316964 4.518616217 4.107979334 3.959359318 4.427255644 3.127111744 4.444027165 5.432977714 4.91932701
ENSG00000243485 -1.132208473 -1.13259963 -1.132423133 -0.82661946 -1.129384537 -1.127751285 -1.132434672 -1.132215675 -1.127467001 -0.819432811 -1.121395624 -1.128593051 -1.124421178 -1.131058152 -1.090539496 -0.28826095
ENSG00000284332 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I want to know why the rlogTransformed numbers come like this. Please give me kind explanation of this.
Thank you.
Thank you for kind reply. Do you mean, when counts are normalized, there would be numbers less than 1 (0 < x < 1). Then these numbers will be log transformed as minus values?
As far as I understand, deseq() performs normalization of counts, and I can get normalized count values(FPKM) through counts(dds, normalized = TRUE) command. But still, in normalized counts, it seems that some 0 are transformed to negative values. (I edited the post to paste normalized counts)
If you have any idea, please answer me.
This is becausae of how rlog works.
The r in
rlog
stands for "regularised". That means that whatrlog
is computing is for any count Xrlog(X) = log(X+a)
. In many applications people just usea=1
, butrlog
calculates a more appropriatea
for each gene, thus you can end up with ana
less than one. Ifa
was 0.5, for example, thenrlog(0)
would be -1.Thank you i.sudbery. Your explanation is understandable. Now I have no doubt. :)