Question

Gage GO annotations vs Entrez

0

Entering edit mode

7.1 years ago

bsp017 ▴ 50

Hi all,

I'm using Gage v2.28.2 to analyse gene enrichment pathways in RNA-seq data. Genes were initially annotated with entrez ID, which worked fine:

kg.eco=kegg.gsets("eco")
kg.eco.eg=kegg.gsets("eco", id.type = "entrez")
headkg.eco.eg$kg.sets, 2)
    keggres = gage(p3, gsets=kg.eco.eg$kg.sets, ref = ref.idx, samp = samp.idx, same.dir = F)
    lapply(keggres, head)
    $greater
                                                             p.geomean stat.mean
    eco02010 ABC transporters                             5.636397e-06  4.437748
    eco01100 Metabolic pathways                           4.192560e-07  3.560120
    eco02020 Two-component system                         2.976999e-03  2.635986
    eco02060 Phosphotransferase system (PTS)              1.029505e-02  2.316506
    eco02024 Quorum sensing                               8.269080e-03  2.290890
    eco01120 Microbial metabolism in diverse environments 8.855620e-04  2.186511
                                                                 p.val        q.val
    eco02010 ABC transporters                             1.255645e-09 6.654916e-08
    eco01100 Metabolic pathways                           4.372851e-07 1.158805e-05
    eco02020 Two-component system                         1.382499e-04 2.442415e-03
    eco02060 Phosphotransferase system (PTS)              8.903017e-04 1.053680e-02
    eco02024 Quorum sensing                               9.940376e-04 1.053680e-02
    eco01120 Microbial metabolism in diverse environments 1.307795e-03 1.076939e-02

Input data is the same except row names are changed to GO terms:

head(p3)
                                 Bg_NB25_v_BgNS26_2h_fc Bg_NB25_v_BgNP27_2h_fc
<NA>                                           1.559100               1.492170
GO:0000150|GO:0003677|GO:0006310               1.696600               1.251170
<NA>                                           0.688138               0.403168
GO:0003824                                     0.770600               0.744205
GO:0006355                                     1.185640               1.403170
GO:0008982|GO:0009401|GO:0016020               2.092530               0.818206
                                 Bg_NB31_v_BgNS32_2h_fc Bg_NB31_v_BgNP33_2h_fc
<NA>                                         -0.0207885               0.401330
GO:0000150|GO:0003677|GO:0006310              1.3511800               0.285852
<NA>                                          0.3511800              -0.299110

Then I ran the following commands:

data(go.sets.hs)
data(go.subs.hs)
keggres = gage(p3, gsets=go.sets.hs[go.subs.hs$BP], same.dir = F)

The results contain only NA's, I'm not sure why this is?

lapply(keggres, head)
$greater
                                               p.geomean stat.mean p.val q.val
GO:0000002 mitochondrial genome maintenance           NA       NaN    NA    NA
GO:0000003 reproduction                               NA       NaN    NA    NA
GO:0000012 single strand break repair                 NA       NaN    NA    NA
GO:0000018 regulation of DNA recombination            NA       NaN    NA    NA
GO:0000019 regulation of mitotic recombination        NA       NaN    NA    NA
GO:0000022 mitotic spindle elongation                 NA       NaN    NA    NA
                                               set.size Bg_NB31_v_BgNS32_2h_fc
GO:0000002 mitochondrial genome maintenance           0                     NA
GO:0000003 reproduction                               0                     NA
GO:0000012 single strand break repair                 0                     NA
GO:0000018 regulation of DNA recombination            0                     NA
GO:0000019 regulation of mitotic recombination        0                     NA
GO:0000022 mitotic spindle elongation                 0                     NA

There are a total of ~13,0000 genes in the original mapping database. Only ~32000 of these have GO annotations. The vast majority of enriched genes should be bacterial.

Gage gene enrichment RNA-seq entrez gene ontology • 1.7k views

ADD COMMENT • link 7.1 years ago by bsp017 ▴ 50

0

Entering edit mode

Remove the NA row names from the p3 object and try again.

ADD REPLY • link 7.1 years ago by h.mon 35k

0

Entering edit mode

Thanks for the reply. Same result with NA row names removed:

lapply(keggres, head)
$greater
                                                                       p.geomean
GO:0000009 alpha-1,6-mannosyltransferase activity                             NA
GO:0000010 trans-hexaprenyltranstransferase activity                          NA
GO:0000014 single-stranded DNA specific endodeoxyribonuclease activity        NA
GO:0000016 lactase activity                                                   NA
GO:0000026 alpha-1,2-mannosyltransferase activity                             NA
GO:0000030 mannosyltransferase activity                                       NA
                                                                       stat.mean
GO:0000009 alpha-1,6-mannosyltransferase activity                            NaN
GO:0000010 trans-hexaprenyltranstransferase activity                         NaN
GO:0000014 single-stranded DNA specific endodeoxyribonuclease activity       NaN
GO:0000016 lactase activity                                                  NaN
GO:0000026 alpha-1,2-mannosyltransferase activity                            NaN
GO:0000030 mannosyltransferase activity                                      NaN

ADD REPLY • link 7.1 years ago by bsp017 ▴ 50