EdgeR multiple samples
0
0
Entering edit mode
2.8 years ago
Pahi ▴ 30

Hi!

Recently I'm working on an R pipeline for RNA-seq analysis. My goal is to create a script that compares my samples to each other and compute the DE.

So far I created my script and its working well. But when I implemented loops which would make my life easier, I noticed that the result are different. To be precise: First I did the comparisons sample pair by pair, than with the loops I loaded in all my data and made the comparison sample pair by pair.

What I noticed is everything goes as it should be, than when I reach the part where I call the

fit <- glmQLFit(dge,dge$design,robust = TRUE)

function, the a little difference shows up:

enter image description here

enter image description here

The first pic shows when ALL the samples loaded in, the second is when only 2 group (6 samples) is loaded in. After that when I want to make a

res <- glmQLFTest(fit,contrast = c(-1, 0,1,0))

it obviously shows different result. Edger uses different calculations based on the loaded samples? Also it makes the DE totally different when I try to visualize it. For the 2 example:

enter image description here

enter image description here

One plot shows a lot of DE genes (this is where only 2 groups (6 samples) were compared), and the other only shows 2 (this is where all the samples processed together).

My code looks the same even if I use 2 group samples or multiple group samples.

What could be the problem? Thank you for your time and help!

EdgeR R • 1.1k views
ADD COMMENT
2
Entering edit mode

If I understand correctly, your question is basically that you get a slightly different result if you calculate DE using all your data versus subsets of your data. The reason the results will not be identical is because the variance estimate for a gene will be much more informed in a larger data set, across more conditions or samples, than in a smaller data set. It's not so much a problem as a feature. Better estimates of variance allow better evaluation of significance.

ADD REPLY
0
Entering edit mode

My code :

group <- target$group

reads <- CTRL_INJURED_vs_CTRL_UNINJURED 
gene_name <- rownames(reads)

dge <- DGEList(counts=reads, genes = gene_name, group = group) 
dge$design <- model.matrix(~0+group, data=dge$samples)

dge$genes$ENTREZID <- mapIds(org.Dm.eg.db, dge$genes$genes, keytype = "ENSEMBL",column="ENTREZID") 
dge$genes$Symbol <- mapIds(org.Dm.eg.db, dge$genes$genes, keytype = "ENSEMBL", column="SYMBOL")


dge <- dge[!is.na(dge$genes$ENTREZID),] 
dge <- dge[!is.na(dge$genes$Symbol),] 

keep_dge <- filterByExpr(dge,dge$design) 
table(keep_dge) 
dge <- dge[keep_dge,,keep.lib.sizes=FALSE]

dge <- calcNormFactors(dge) 
dge <- estimateDisp(dge)

dge_df <- as.data.frame(dge)

fit2 <- glmQLFit(dge,dge$design,robust = TRUE) 
head(fit$coefficients)

colsToTest <- grep(varInt,colnames(fit2$design)) 
namesToTest <- paste(gsub(varInt,"",colnames(fit2$design)[colsToTest]),"_vs_",condRef)


results <- list() 
lrt_list <- list()

if (length(colsToTest)>=2){   colnames <- gsub(varInt,"",colnames(fit2$design))   for (comp in combn(length(colsToTest),2,simplify=FALSE)){
    contrast <- numeric(ncol(dge$design))
    contrast[colsToTest[comp[1:2]]] <- c(-1,1)
    namecomp <- paste0(colnames[colsToTest[comp[2]]],"_vs_",colnames[colsToTest[comp[1]]])
    cat(paste0("Comparison ",gsub("_"," ",namecomp),": testing contrast (",paste(contrast,collapse=", "),")"),"\n")
    lrt <- glmQLFTest(fit2, contrast=contrast)
    results[[namecomp]] <- topTags(lrt,n=nrow(dge$counts))$table
    lrt_list[[namecomp]] <- lrt      } }
ADD REPLY

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6