I would like to check for pseudogenes of a bacterium if I find some groups of genes beeing more present than others. I would like to do that via GO terms. I already have the GO numbers extracted using InterProScan against Pfam database.
How do I check if some enrichement is present? I would like to this for a Lactobacillus species.
If there is an Idea for another approach, I have the following information: 1. Complete annotation of the genome 2. All CDS sequences as DNA/AA 3. Pseudogenes as DNA and the six frame translations of this sequence (meaning that at least one of this translations give a meaningfull result when compared to a database)
Any Ideas?
UPDATE:
Now I have all the GO numbers (in format like GO:0016021
) for the pseudogenes and also for all CDS (as the background).
Do you know some tools beeing able to deal with this to perform enrichment analysis?