Question

edgeR library size

0

Entering edit mode

10.0 years ago

schelarina ▴ 50

Hello everyone, one little question on edgeR. I have the matrix counts file, a data.frame samples file, and the annotation file.

The manual says "The data.frame samples contains a column lib.size for the library size or sequencing depth for each sample. If not specified by the user, the library sizes will be computed from the column sums of the counts. For classic edgeR the data.frame samples must also contain a column group, identifying the group membership of each sample."

I tried by introducing a column for the library size like this

             group        lib.size    
sample X     1            8094363
sample X     1            5005492
sample Y     2            7094693
sample Y     2            6094693

etc

so I do like this:

x <- read.delim("counts.txt", stringsAsFactors=FALSE)
group <- (c(1,1,2,2))
genes <- read.delim("genes.txt")
y <- DGEList(counts=x, group=group, genes=genes)
y <- calcNormFactors(y)
y$samples

but then edgeR ricalculates the library size putting a different number and introduce the normalization factor.

How and where to specify this library size in the correct way or avoid the replacement?

Thanks for you help

RNA-Seq R • 11k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by schelarina ▴ 50

Ram · Answer 1 · 2015-08-06

1

Entering edit mode

10.0 years ago

Irsan ★ 7.8k

If you use the lib.size argument in the DGEList() function it will not recalculate. So do

DGEList(counts=x, group=group, genes=genes, lib.sizes=c(1,2,3,4))

instead. And replace the 1, 2, 3, 4 by real numbers

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by Irsan ★ 7.8k

0

Entering edit mode

thanks! it works perfectly now

ADD REPLY • link 9.9 years ago by schelarina ▴ 50