I need your help for Goseq R code for gene ontology enrichment analysis for differentially expressed gene identified by DESeq2.
I got the differentially expressed genes and I can also download the mapping file from biomart for all the rice gene ID like below:
Gene stableID Transcript stable ID GO term accession
BGIOSGA013239 BGIOSGA013239-TA GO:0009098
BGIOSGA013239 BGIOSGA013239-TA GO:0003862
BGIOSGA013239 BGIOSGA013239-TA GO:0009082
BGIOSGA013239 BGIOSGA013239-TA GO:0016616
BGIOSGA013239 BGIOSGA013239-TA GO:0051287
.....................
Goseq code:
d <- read.csv("deseq2res.csv", header=T, row.names=1)
all_genes <- row.names(d)
DE_genes <- all_genes[d$padj<0.05]
I am not sure how should I proceed further after this? I am not able to understand how should I get the genes.vector and length.vector.names for the below code and then GO_data.frame?
pwf <- nullp(genes.vector,bias.data=length.vector.names)
head(pwf)
# calculate GO enrichment using default method
GO.WALL <- goseq(pwf, gene2cat=GO_data.frame)
Many thanks, Bioinfonext
Not a GOSeq solution but since you are working with Oryza sativa indica you could use AgriGO v2.
The input file will be a list of
gene_id
i.e, thegene_id
of your differentially expressed genes