Question

dispersion is NA error message with edgeR

1

Entering edit mode

7.1 years ago

Sharon ▴ 610

I am getting the following error for the code below and I do have 3 replicates. Any one come through this before?

Error in exactTest(dge, pair = c("WT", "Mut")) : dispersion is NA

labels=c("WT_1", "WT_2", "WT_3", "Mut_1", "Mut_2", "Mut_3")
data <- readDGE(files)
group <- c(rep("WT", 3), rep("Mut", 3))
dge = DGEList(counts=data, group=group)
idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge, pair=c("WT", "Mut"))
etp <- topTags(et)
etp$table$logFC = -etp$table$logFC

RNA-Seq edgeR • 3.9k views

ADD COMMENT • link updated 7.1 years ago by Kevin Blighe 88k • written 7.1 years ago by Sharon ▴ 610

score 0 · Answer 1 · 2017-11-01

0

Entering edit mode

7.1 years ago

Kevin Blighe 88k

Hey Sharon,

It looks like you have left out your model design formula from the two dispersion functions, for example:

design <- model.matrix(~group)
...
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge, design)
dge <- estimateTagwiseDisp(dge, design)

Kevin

ADD COMMENT • link 7.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Hey Kevin Thank you. I think we can do that. It works without the design matrix. The dispersion error comes when I added the genes id addition part. If I removed this genes ID part, every thing works. like if I removed the section below every thing works correctly. But the genes ID are added correctly though. They just cause dispersion error.

idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)

ADD REPLY • link 7.1 years ago by Sharon ▴ 610

0

Entering edit mode

I see, Sharon. Yes, I actually just noticed your previous thread where you had originally posted this.

There must be some other 'names' attribute of the dge object that is causing the discrepancy.

What about something like names(dge) or rownames(dge), or even just output the result of str(dge) to see the structure of the object, which may help us to identify the issue.

ADD REPLY • link 7.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks Kevin. So, I think the confusion is I am starting from transcripts counts from Salmon and wanted to find the upregulated/downregulated genes in edgeR. So, the confusion comes from converting transcripts ID to genes ID, that introduces some nulls which causes dispersion.

ADD REPLY • link 7.1 years ago by Sharon ▴ 610

0

Entering edit mode

Okay, yes, your m vectors will likely contain many NAs for names that don't match. match() does return NAs like that, whereas which() only returns the indices of the matches.

You could either remove these non-matches (not desirable) or convert them back to their original values with something like:

ifelse( is.na(egREFSEQ$gene_id[m]), dge$genes$RefSeqID, egREFSEQ$gene_id[m] )

That should work, but test it well. It will keep anything that has been successfully converted to egREFSEQ$gene_id, but, for non-matches, it will restore the original value (dge$genes$RefSeqID)

ADD REPLY • link 7.1 years ago by Kevin Blighe 88k