I'm currently conducting a transcriptomics study on Nicotiana benthamiana, and have reached a point where i would like to conduct a GO enrichment analysis using GOseq or topGO. Of course neither of these have N. benthamiana supported natively, so i would have to provide the GO mappings N. benthamiana genes myself. Other studies using the same sequence and annotations have used GOseq and topGO, but do not go into detail in their methods sections how they did this.
Of course i could simply feed a fasta of the DEGs into blast2go, but i don't actually have blast2go, and i feel like i'm missing something.
GOseq takes its gene to category associations as a named listed of vectors, where each vector is the terms associated with each gene, as such, a GAF file isn't neccessarily the end of the solution anyway.
I'd create this object from the annotation txt file.
Thank you for the help, I'm going through the same problem. What I can't understand is how the gene2cat file should look.
In my case, I have a table with the gene ids and the GO terms divided by category (GO_Biological, Go_Cellular, etc) (see picture at bottom). I have done a try using GO_Biological with your tip:
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'GOID'. Please use the keys method to see a listing of valid arguments.
I don't know how to prepare the gene2cat object properly, in addition, I would like to use all the GO categories at the same time.
Do you have any suggestions on how to proceed?
Thank you!!
If you are providing your own categories, then the test.cats parameter doesn't apply. All your categories are BP, so you don't need to tell it to run only those categories.
Also, why annotation1000$Query_sequence rather than annotation1000$GeneID?
Thank you.
Finally I had to modify all the document with BASH and R forcing it to look like
"GENEID" "ONTOLOGY" "GOID" "TERM"
PITA_XXXXX CC GO:XXXXX cellular_component
Individual GO terms in each row for all the genes.
By the way, annotation1000$Query_sequence was my mistake.
This seems to work almost perfectly, the functions in brackets next to the GO terms in the functional annotations file are un-helpfully worded, so i will have to use a few lines of gsub to remove bits of leftover text,
e.g.
I'm currently trying to do GO analysis with N benthamiana with this Niben101 version too and I'm so happy finding your post.
May I ask how your experience was with the .GAF file and if you would be willing to share it with me?
Since I'm a total beginner and don't really know how to use the codes you added to convert the .txt to .GAF file.
That would definitely help me so much!
Best wishes,
Trang
Thank you for the help, I'm going through the same problem. What I can't understand is how the gene2cat file should look. In my case, I have a table with the gene ids and the GO terms divided by category (GO_Biological, Go_Cellular, etc) (see picture at bottom). I have done a try using GO_Biological with your tip:
The output looks good (each gene with its bunch of GO terms). However, when I run the following:
or
I obtain this error:
I don't know how to prepare the gene2cat object properly, in addition, I would like to use all the GO categories at the same time. Do you have any suggestions on how to proceed? Thank you!!
If you are providing your own categories, then the
test.cats
parameter doesn't apply. All your categories are BP, so you don't need to tell it to run only those categories.Also, why
annotation1000$Query_sequence
rather thanannotation1000$GeneID
?Thank you. Finally I had to modify all the document with BASH and R forcing it to look like "GENEID" "ONTOLOGY" "GOID" "TERM" PITA_XXXXX CC GO:XXXXX cellular_component Individual GO terms in each row for all the genes. By the way, annotation1000$Query_sequence was my mistake.