dispersion is NA error message with edgeR
1
1
Entering edit mode
7.1 years ago
Sharon ▴ 610

I am getting the following error for the code below and I do have 3 replicates. Any one come through this before?

Error in exactTest(dge, pair = c("WT", "Mut")) : dispersion is NA

labels=c("WT_1", "WT_2", "WT_3", "Mut_1", "Mut_2", "Mut_3")
data <- readDGE(files)
group <- c(rep("WT", 3), rep("Mut", 3))
dge = DGEList(counts=data, group=group)
idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge, pair=c("WT", "Mut"))
etp <- topTags(et)
etp$table$logFC = -etp$table$logFC
RNA-Seq edgeR • 3.9k views
ADD COMMENT
0
Entering edit mode
7.1 years ago

Hey Sharon,

It looks like you have left out your model design formula from the two dispersion functions, for example:

design <- model.matrix(~group)
...
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge, design)
dge <- estimateTagwiseDisp(dge, design)

Kevin

ADD COMMENT
0
Entering edit mode

Hey Kevin Thank you. I think we can do that. It works without the design matrix. The dispersion error comes when I added the genes id addition part. If I removed this genes ID part, every thing works. like if I removed the section below every thing works correctly. But the genes ID are added correctly though. They just cause dispersion error.

idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)
ADD REPLY
0
Entering edit mode

I see, Sharon. Yes, I actually just noticed your previous thread where you had originally posted this.

There must be some other 'names' attribute of the dge object that is causing the discrepancy.

What about something like names(dge) or rownames(dge), or even just output the result of str(dge) to see the structure of the object, which may help us to identify the issue.

ADD REPLY
0
Entering edit mode

Thanks Kevin. So, I think the confusion is I am starting from transcripts counts from Salmon and wanted to find the upregulated/downregulated genes in edgeR. So, the confusion comes from converting transcripts ID to genes ID, that introduces some nulls which causes dispersion.

ADD REPLY
0
Entering edit mode

Okay, yes, your m vectors will likely contain many NAs for names that don't match. match() does return NAs like that, whereas which() only returns the indices of the matches.

You could either remove these non-matches (not desirable) or convert them back to their original values with something like:

ifelse( is.na(egREFSEQ$gene_id[m]), dge$genes$RefSeqID, egREFSEQ$gene_id[m] )

That should work, but test it well. It will keep anything that has been successfully converted to egREFSEQ$gene_id, but, for non-matches, it will restore the original value (dge$genes$RefSeqID)

ADD REPLY

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6