Large sample size in edger
1
0
Entering edit mode
3.8 years ago
Peter ▴ 20

Hello,

I have a spreadsheet that I intend to analyze using the edgeR package. Usually the number of samples is small, so I use a script that looks like this:

mydata <- as.matrix (read.table (mydata, header = TRUE, sep = "\ t", row.names = 1, as.is = TRUE))

libSizes <- as.vector (colSums (mydata))

groups <- c ("CTRL", "CTRL", "CTRL", "WW", "WW", "WW")

d <- DGEList (counts = mydata, group = factor (groups), lib.size = libSizes)

d <- calcNormFactors (d)

d1 <- estimateCommonDisp (d, verbose = T)

d1 <- estimateTagwiseDisp (d1)

fit = glmFit (d1)

result <-glmLRT (fit)

However, now I have a total of 1200 samples, divided into two groups: CTRL (n = 518) and INF (n = 582).

When I apply the step to create the group vector, the program returns the "+" sign, as if it were not able to store so many values.

Can someone help me?

Thank you

RNA-Seq R edgeR • 753 views
ADD COMMENT
3
Entering edit mode
3.8 years ago
Gordon Smyth ★ 7.7k

For RNA-seq with such large sample numbers, I would use limma instead of edgeR, although the quasi-likelihood pipeline of edgeR can also handle a lot of samples.

I don't understand your comment about "create the group vector". Surely there can't be any problem with that. Where your code would run into problem is with estimateTagwiseDisp(), which is not designed for such large numbers of samples.

ADD COMMENT
0
Entering edit mode

Thank you so much for your answer, Gordon!

I will adopt the limma to do my analysis.

ADD REPLY

Login before adding your answer.

Traffic: 1622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6