Hi!
I have a transcriptome data from a spider in which we are trying to annotate the whole thing and do an enrichment analysis to find out what is the expression profile from this spider's venom gland. Since this spider (Loxosceles spp) has no reference genome on the NR-database, we are having a tough time annotating the whole transcriptome, we could only annotate 13k out of 62k (we do not have Blast2GO).
The transcriptome was obtained through an RNA-seq in an Illumina platform.
Thus, we used hmmer2go for that annotation and now we need a program to perform the enrichment analysis, do you have any sugggestion or know anything else we can use?
Here is an example of our input:
contig_35271_14 GO:0003743 GO:0031369 GO:0006413
contig_9403_9 GO:0016491 GO:0055114 GO:0016491
contig_54663_2 GO:0007165
contig_8455_23 GO:0007218
contig_380_34 GO:0005515 GO:0005089 GO:0035023
contig_4108_58 GO:0005515 GO:0005515
contig_46052_10 GO:0005515 GO:0005515
contig_5172_8 GO:0003676 GO:0004523
Thanks in advance!
From above example input, I do not think there is much information to construct GO association or hierarchy file (as I can see in requirements of Ontologizer). As I understood, the task is simple to use the above specified file as reference db file to perform enrichment analysis using different contig_xxxxx ids.
Since geneassociation file contains gene ID with corresponding GO categories (plus a bunch of columns which could be filled with, in fact, any data) it is possible. Construction of custom ontology is absolutely not required as it is generally universal. Take a look at row from my custom geneassociation for maize (it not precisely mirrors number of column because in some there are no data):
ensembl AC148152.3_FG006 AC148152.3_FG006 GO:0003824 NA|protein_coding|protein_coding IEA F catalytic activity AC148152.3_FGT006||| protein taxon:4577 160112 B73|AGPv2
This explains perfectly. Sorry, I guess I was wrong.