Entering edit mode
7.2 years ago
Sharon
▴
610
I am getting the following error for the code below and I do have 3 replicates. Any one come through this before?
Error in exactTest(dge, pair = c("WT", "Mut")) : dispersion is NA
labels=c("WT_1", "WT_2", "WT_3", "Mut_1", "Mut_2", "Mut_3")
data <- readDGE(files)
group <- c(rep("WT", 3), rep("Mut", 3))
dge = DGEList(counts=data, group=group)
idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge, pair=c("WT", "Mut"))
etp <- topTags(et)
etp$table$logFC = -etp$table$logFC
Hey Kevin Thank you. I think we can do that. It works without the design matrix. The dispersion error comes when I added the genes id addition part. If I removed this genes ID part, every thing works. like if I removed the section below every thing works correctly. But the genes ID are added correctly though. They just cause dispersion error.
I see, Sharon. Yes, I actually just noticed your previous thread where you had originally posted this.
There must be some other 'names' attribute of the dge object that is causing the discrepancy.
What about something like
names(dge)
orrownames(dge)
, or even just output the result ofstr(dge)
to see the structure of the object, which may help us to identify the issue.Thanks Kevin. So, I think the confusion is I am starting from transcripts counts from Salmon and wanted to find the upregulated/downregulated genes in edgeR. So, the confusion comes from converting transcripts ID to genes ID, that introduces some nulls which causes dispersion.
Okay, yes, your m vectors will likely contain many NAs for names that don't match.
match()
does return NAs like that, whereaswhich()
only returns the indices of the matches.You could either remove these non-matches (not desirable) or convert them back to their original values with something like:
That should work, but test it well. It will keep anything that has been successfully converted to egREFSEQ$gene_id, but, for non-matches, it will restore the original value (dge$genes$RefSeqID)