edgeR Gene Ontology Analysis always returns the same list of pathways
0
1
Entering edit mode
8.1 years ago
moxu ▴ 510

edgeR returns a list of differentially expressed genes OK, at least they do not always return genes in the same order.

However, when I run edgeR to get the top Gene Ontology pathways like the following:

rst<-topTags(qlfTest, n=nrow(d));
drst <- data.frame("GENE_ID"=rownames(rst), rst);
write.table(drst, file="mytreatment.DEG.txt", sep="\t", row.names=F, quote=F);

## the above "drst" returns a list of differentially expressed genes with pvalues, FDR, etc., and they look alright.

# get the differentially expressed genes list, throw away genes with FDR >= 0.25
degs <- as.vector(subset(drst, FDR < 0.25)[[1]])

# do GO analysis
go <- goana(degs, species="Hs");
tgo <- topGO(go);

It always return the following list of pathways:

GO_ID   Term    Ont N   DE  P.DE
GO:0000002  mitochondrial genome maintenance    BP  20  0   1
GO:0000003  reproduction    BP  925 0   1
GO:0000009  alpha-1,6-mannosyltransferase activity  MF  2   0   1
GO:0000010  trans-hexaprenyltranstransferase activity   MF  2   0   1
GO:0000012  single strand break repair  BP  8   0   1
GO:0000014  single-stranded DNA endodeoxyribonuclease activity  MF  6   0   1
GO:0000015  phosphopyruvate hydratase complex   CC  4   0   1
GO:0000016  lactase activity    MF  1   0   1
GO:0000018  regulation of DNA recombination BP  51  0   1
GO:0000019  regulation of mitotic recombination BP  4   0   1
GO:0000022  mitotic spindle elongation  BP  2   0   1
GO:0000023  maltose metabolic process   BP  1   0   1
GO:0000026  alpha-1,2-mannosyltransferase activity  MF  3   0   1
GO:0000027  ribosomal large subunit assembly    BP  2   0   1
GO:0000028  ribosomal small subunit assembly    BP  8   0   1
GO:0000030  mannosyltransferase activity    MF  17  0   1
GO:0000033  alpha-1,3-mannosyltransferase activity  MF  3   0   1
GO:0000035  acyl binding    MF  1   0   1
GO:0000036  ACP phosphopantetheine attachment site binding involved in fatty acid biosynthetic process  MF  1   0   1
GO:0000038  very long-chain fatty acid metabolic process    BP  25  0   1

No matter what experiment datasets (can be totally unrelated experiments). I guess something is not done right.

Thanks!

software error RNA-Seq next-gen gene • 3.9k views
ADD COMMENT
0
Entering edit mode

Can you post head(degs) ?

ADD REPLY
0
Entering edit mode
> head(degs)
[1] "PPP1R10"  "C15orf52" "JPH2"     "PRMT6"    "SKA3"     "CREB1"

Guess they need to be converted to Entrez IDs from these (probably HGNC) gene symbols?

ADD REPLY
0
Entering edit mode

Yes. You need Entrez IDs as input to goana.

ADD REPLY
0
Entering edit mode

That might be the answer. However, I have not been able to do the conversion yet -- cannot find a tool to do so.

I tried

 library(org.Hs.eg.db)
 library(annotate)

in R, but it looks like it can only convert Entrez IDs to HGNC gene symbols but not the opposite.

Thanks!

ADD REPLY
0
Entering edit mode
library(Homo.sapiens)

de <- select(Homo.sapiens, keys = degs, keytype = "SYMBOL", columns="ENTREZID")
## remove duplicate
mat <- match(de$SYMBOL, degs)
de <- de[mat,]

By the way, I recommend reading this tutor as it explains how to perform pathway/gene set analysis with edgeR

ADD REPLY
0
Entering edit mode

Thanks, it worked for human!

Unable to do the same for mouse though -- cannot find the "Homo.sapiens" counterpart after installing org.Mm.eg.db.

ADD REPLY
0
Entering edit mode
biocLite("Mus.musculus")
library(Mus.musculus)

Actually it is contained in org.Mm.egGO object.

ADD REPLY

Login before adding your answer.

Traffic: 2156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6