Annotation with KEGGrest
0
0
Entering edit mode
3.7 years ago
aranyak111 • 0

The first few lines of my file look like this

Gene    adj.P.Val
ENSDARG00000087345  1.39E-128
ENSDARG00000058371  5.68E-50
ENSDARG00000026403  1.77E-48
ENSDARG00000006427  2.36E-48
ENSDARG00000041381  8.10E-39
ENSDARG00000071601  1.81E-33
ENSDARG00000054191  2.73E-32
ENSDARG00000098488  5.67E-31
ENSDARG00000076129  2.31E-29.

I wanted to perform the KEGG enrichment test for selected genes in ZebraFIsh.

Code snippets used are

# Pull all pathways for AT  

pathways.list <- keggList("pathway", "dre")
head(pathways.list)
# Pull all genes for each pathway
pathway.codes <- sub("path:", "", names(pathways.list)) 
genes.by.pathway <- sapply(pathway.codes,
    function(pwid){
        pw <- keggGet(pwid)
        if (is.null(pw[[1]]$GENE)) return(NA)
        pw2 <- pw[[1]]$GENE[c(TRUE,FALSE)] # may need to modify this to c(FALSE, TRUE) for other organisms
        pw2 <- unlist(lapply(strsplit(pw2, split = ";", fixed = T), function(x)x[1]))
        return(pw2)
    }
)

head(genes.by.pathway)

My dataframe is read as DE.table <- read.delim(df, stringsAsFactors = F)

geneList <- DE.table$P.Value
names(geneList) <- DE.table$Gene
head(geneList)

# Wilcoxon test for each pathway
pVals.by.pathway <- t(sapply(names(genes.by.pathway),
    function(pathway) {
        pathway.genes <- genes.by.pathway[[pathway]]
        list.genes.in.pathway <- intersect(names(geneList), pathway.genes)
        list.genes.not.in.pathway <- setdiff(names(geneList), list.genes.in.pathway)
        scores.in.pathway <- geneList[list.genes.in.pathway]
        scores.not.in.pathway <- geneList[list.genes.not.in.pathway]
        if (length(scores.in.pathway) > 0){
            p.value <- wilcox.test(scores.in.pathway, scores.not.in.pathway, alternative = "less")$p.value
        } else{
            p.value <- NA
        }
        return(c(p.value = p.value, Annotated = length(list.genes.in.pathway)))
    }
))


 # Assemble output table

    outdat <- data.frame(pathway.code = rownames(pVals.by.pathway))
    outdat$pathway.name <- pathways.list[outdat$pathway.code]
    outdat$p.value <- pVals.by.pathway[,"p.value"]
    outdat$Annotated <- pVals.by.pathway[,"Annotated"]
    outdat <- outdat[order(outdat$p.value),]
    head(outdat)

The output is like this pathway.code pathway.name p.value Annotated

1     dre00010         <NA>      NA         0
2     dre00020         <NA>      NA         0
3     dre00030         <NA>      NA         0
4     dre00040         <NA>      NA         0

and what is expected is

pathway.code pathway.name      p.value Annotated
109     ath03010         <NA> 5.378589e-35       301
96      ath00966         <NA> 1.517953e-06        24
131     ath04141         <NA> 5.032367e-06       189
10      ath00071         <NA> 8.804424e-06        45
110     ath03013         <NA> 2.771177e-04       155
56      ath00592         <NA> 1.097775e-03        3

8

the expected data is built with respect to the Arabidopsis database and my database of interest is Zebrafish. Preferably pathway name inclusion will also be helpful.

RNA-Seq • 813 views
ADD COMMENT

Login before adding your answer.

Traffic: 1384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6