Question

Annotation with KEGGrest

0

Entering edit mode

3.7 years ago

aranyak111 • 0

The first few lines of my file look like this

Gene    adj.P.Val
ENSDARG00000087345  1.39E-128
ENSDARG00000058371  5.68E-50
ENSDARG00000026403  1.77E-48
ENSDARG00000006427  2.36E-48
ENSDARG00000041381  8.10E-39
ENSDARG00000071601  1.81E-33
ENSDARG00000054191  2.73E-32
ENSDARG00000098488  5.67E-31
ENSDARG00000076129  2.31E-29.

I wanted to perform the KEGG enrichment test for selected genes in ZebraFIsh.

Code snippets used are

# Pull all pathways for AT  

pathways.list <- keggList("pathway", "dre")
head(pathways.list)
# Pull all genes for each pathway
pathway.codes <- sub("path:", "", names(pathways.list)) 
genes.by.pathway <- sapply(pathway.codes,
    function(pwid){
        pw <- keggGet(pwid)
        if (is.null(pw[[1]]$GENE)) return(NA)
        pw2 <- pw[[1]]$GENE[c(TRUE,FALSE)] # may need to modify this to c(FALSE, TRUE) for other organisms
        pw2 <- unlist(lapply(strsplit(pw2, split = ";", fixed = T), function(x)x[1]))
        return(pw2)
    }
)

head(genes.by.pathway)

My dataframe is read as DE.table <- read.delim(df, stringsAsFactors = F)

geneList <- DE.table$P.Value
names(geneList) <- DE.table$Gene
head(geneList)

# Wilcoxon test for each pathway
pVals.by.pathway <- t(sapply(names(genes.by.pathway),
    function(pathway) {
        pathway.genes <- genes.by.pathway[[pathway]]
        list.genes.in.pathway <- intersect(names(geneList), pathway.genes)
        list.genes.not.in.pathway <- setdiff(names(geneList), list.genes.in.pathway)
        scores.in.pathway <- geneList[list.genes.in.pathway]
        scores.not.in.pathway <- geneList[list.genes.not.in.pathway]
        if (length(scores.in.pathway) > 0){
            p.value <- wilcox.test(scores.in.pathway, scores.not.in.pathway, alternative = "less")$p.value
        } else{
            p.value <- NA
        }
        return(c(p.value = p.value, Annotated = length(list.genes.in.pathway)))
    }
))


 # Assemble output table

    outdat <- data.frame(pathway.code = rownames(pVals.by.pathway))
    outdat$pathway.name <- pathways.list[outdat$pathway.code]
    outdat$p.value <- pVals.by.pathway[,"p.value"]
    outdat$Annotated <- pVals.by.pathway[,"Annotated"]
    outdat <- outdat[order(outdat$p.value),]
    head(outdat)

The output is like this pathway.code pathway.name p.value Annotated

1     dre00010         <NA>      NA         0
2     dre00020         <NA>      NA         0
3     dre00030         <NA>      NA         0
4     dre00040         <NA>      NA         0

and what is expected is

pathway.code pathway.name      p.value Annotated
109     ath03010         <NA> 5.378589e-35       301
96      ath00966         <NA> 1.517953e-06        24
131     ath04141         <NA> 5.032367e-06       189
10      ath00071         <NA> 8.804424e-06        45
110     ath03013         <NA> 2.771177e-04       155
56      ath00592         <NA> 1.097775e-03        3

8

the expected data is built with respect to the Arabidopsis database and my database of interest is Zebrafish. Preferably pathway name inclusion will also be helpful.

RNA-Seq • 813 views

ADD COMMENT • link updated 3.7 years ago by Istvan Albert 102k • written 3.7 years ago by aranyak111 • 0