Question

GOSeq gene ontology enrichment analysis in rice

1

Entering edit mode

4.7 years ago

Bioinfonext ▴ 470

I need your help for Goseq R code for gene ontology enrichment analysis for differentially expressed gene identified by DESeq2.

I got the differentially expressed genes and I can also download the mapping file from biomart for all the rice gene ID like below:

Gene stableID   Transcript stable ID           GO term accession
BGIOSGA013239  BGIOSGA013239-TA                   GO:0009098
BGIOSGA013239  BGIOSGA013239-TA                   GO:0003862
BGIOSGA013239  BGIOSGA013239-TA                   GO:0009082
BGIOSGA013239  BGIOSGA013239-TA                   GO:0016616
BGIOSGA013239  BGIOSGA013239-TA                   GO:0051287

.....................

Goseq code:
d <- read.csv("deseq2res.csv", header=T, row.names=1)
all_genes <- row.names(d)
 DE_genes <- all_genes[d$padj<0.05]

I am not sure how should I proceed further after this? I am not able to understand how should I get the genes.vector and length.vector.names for the below code and then GO_data.frame?

pwf <- nullp(genes.vector,bias.data=length.vector.names)
head(pwf)

# calculate GO enrichment using default method
GO.WALL <- goseq(pwf, gene2cat=GO_data.frame)

Many thanks, Bioinfonext

goseq R • 2.0k views

ADD COMMENT • link updated 13 months ago by i.sudbery 21k • written 4.7 years ago by Bioinfonext ▴ 470

0

Entering edit mode

Not a GOSeq solution but since you are working with Oryza sativa indica you could use AgriGO v2.

The input file will be a list of gene_id i.e, the gene_id of your differentially expressed genes

ADD REPLY • link 4.7 years ago by andres.firrincieli 3.8k

score 0 · Answer 1 · 2024-01-23

For the genes vector you need a named vector with one entry for each gene in your experiment, and a 1 if it is DE and a 0 if it is not. The names should be the gene names.

From your question this could be generated using the vectors you have:

genes.vector <- as.numeric(d$padj < 0.05)
names(genes.vector) <- all_genes

For the GO categories, you want not a frame, but a list, with each entry named for a gene, and containing a vector of all the go categories associated with that gene.

Your can geneate that from the mapping file:

go_df <- read.delim("biomart_go_output.txt")
genes2cat <- split(go_df$GO.accession.term,go_df$Gene.stableID)

For the length.vector.names you need a named vector of the gene lengths, with the names being the geneID of the genes. I can't tell you where to get this for rice, but BioMart might be a good place to start - I suspect you can download a table of exonic lengths (perhaps it would be as the length of each exon), that can then be converted to the vector you need.