Hi everyone, I'm doing a GO analysis after finish the statistical test by edgeR.
Before, I did the comparison between group1
vs group2
, group1
vs group3
, group1
vs group4
.
Here the problem came when I compared group1
vs group4
, there are 1740 genes showing to be significantly overrepresented in group 4.
However, when I used the code below
enrich.go.BP = enrichGO(gene = up_gene.4vs1$GeneID,
OrgDb = Acan.OrgDb,
keyType = "ENTREZID",
ont = "BP", pvalueCutoff = 0.01,
qvalueCutoff = 0.05, readable = T)
There is no enriched terms in the result.
This code worked well when I compared other groups to group1, so I think there may be no problem on code. Thus, I'm wondering why I got this result? How can I fix it? Is it that I got too many genes which locate in almost all kinds of category so that there is no statistical significant enriched terms?
Thank you in advance.
Edited: 2020-06-11
For more information on up_gene4vs1
and Acan.OrgDb
.
The Acan.OrgDb
is the one I loaded by using Annotationhub, because my target species "acanthamoeba castellanii" is not a model organism.
hub <- AnnotationHub::AnnotationHub()
amoeba <- query(hub, "Acanthamoeba castellanii")
# title
# AH73987 | Transcript information for Acanthamoeba castellanii str Neff
# AH73987 | Transcript information for Acanthamoeba castellanii str Neff
# AH74626 | Transcript information for Acanthamoeba castellanii str Neff
# AH81410 | org.Acanthamoeba_castellanii_Neff_strain.eg.sqlite
# AH81411 | org.Acanthamoeba_castellanii_str._Neff.eg.sqlite
# AH81412 | org.Acanthamoeba_castellanii_strain_Neff.eg.sqlite
Here I chose the AH81410
because its Db type is OrgDb
.
Acan.OrgDb <- hub[["AH81410"]]
> Acan.OrgDb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Acanthamoeba castellanii_Neff_strain
| SPECIES: Acanthamoeba castellanii_Neff_strain
| CENTRALID: GID
| Taxonomy ID: 1257118
| Db type: OrgDb
| Supporting package: AnnotationDbi
And from colnames(Acan.OrgDb)
, we could see that it supported ENTREZID
.
> columns(Acan.OrgDb)
[1] "ACCNUM" "ALIAS" "CHR" "ENTREZID" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL"
[11] "ONTOLOGY" "ONTOLOGYALL" "PMID" "REFSEQ" "SYMBOL"
Then, I prepared my significant genes list into ENTREZID
format. The format is generated by combining ORFID, locus_tag and annotation from files downloaded from NCBI.
Here, the GeneID
is recording those id in ENTREZID
format.
>up_gene.4vs1
Locus_tag ORFID Name Accession Start Stop Strand GeneID Locus Protein_product Length
Protein_Name
1 ACA1_000790 gene5490 Un NW_004457578.1 5136 5699 + 14921342 NA XP_004343320.1 187 hypothetical protein ACA1_000790
2 ACA1_001250 gene2057 Un NW_004457658.1 4004 11317 + 14924768 NA XP_004353303.1 1925 hypothetical protein ACA1_001250
3 ACA1_001280 gene2060 Un NW_004457658.1 17392 18733 - 14924773 NA XP_004353305.1 258 hypothetical protein ACA1_001280
4 ACA1_001300 gene2062 Un NW_004457658.1 20701 23681 - 14924770 NA XP_004353306.1 599 fucose1-phosphate guanylyltransferase
You may also notice that there are hypothetical proteins which could blur the prediction. Although there are 691 entries of hypothetical protein, there are still (1049/1740) entries left.
Thus, I'm a little bit confused about the results from enrichGO showing no enriched GO terms.
Could you give me some advices? Thank you in advance.
Cross-posted: https://support.bioconductor.org/p/131653/