I have generated counts using featurecounts for each sample (replicate) of each group and then merged it which looks like the following
Gene_id A_1 A_2 A_3 B_1 B_2 B_3 C_1 C_2 C_3 C_4 B_4 C_5 D_1 D_2 D_3 B_5
gene1 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7
gene2 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
.....
......
geneN 8 1 7 6 5 3 3 2 1 2 5 4 5 7 6 1
I need to do the differential gene expression analysis of groups A-B, A-C, and A-D. My EdgeR script looks like this
df <- read.table("count.txt", header=TRUE, row.names = 1)
group <- factor(c("A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "B", "C", "D", "D", "D", "B"))
group <- relevel(group, ref="A")
dgList <- DGEList(counts=df, group=group)
dgList <- calcNormFactors(dgList)
dgList <- estimateCommonDisp(dgList,verbose=TRUE)
dgList <- estimateGLMTagwiseDisp(dgList)
design = model.matrix(~group)
fit <- glmFit(dgList, design)
for (x in 2:4) {
lrt <- glmLRT(fit, coef=x)
edgeR_result <- topTags(lrt, n=100)
write.table(edgeR_result, file="result.tsv", quote=FALSE, append=TRUE)
}
For my real dataset, I have 25 groups i.e. DGE analysis of 1st group will be done with other 24 groups (pairwise) , so my for loop has (x in 2:24)
. I just wanted to check if I am doing it correctly. For grouping, I am currently doing it manually i.e. checking the first line of the count.txt
file and typing "A", "A"...."B"
in the 2nd line. Is there a syntax in R to make the grouping and reference automatically from the header of count file?
I am doing RNA-Seq analysis using EdgeR for the first time, so any help would be appreciated.