I just did a differential gene expression analysis with cufflinks and imported it into R with cummerbund.
My reference genome and GFF annotation were from Ensembl, and the Ensembl IDs for genes were carried through instead of the mnemonic gene symbols. (see below)
Is there a way to get GO terms for each of my E. coli genes within R/bioconductor? Can I do it with gene identifiers like "gene:b3281", or do I need to translate those to some other identifiers like "aroE" ?
biolinux@biolinux-VirtualBox[biolinux] grep -C 1 aroE Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.2.22.gff3
Chromosome ensembl transcript 3429766 3430023 . - . ID=transcript:AAC76305;Parent=gene:b3280;biotype=protein_coding;external_name=yrdB-1;logic_name=ena
Chromosome ensembl gene 3430020 3430838 . - . ID=gene:b3281;biotype=protein_coding;description=dehydroshikimate reductase%2C NAD(P)-binding;external_name=aroE;logic_name=ena
Chromosome ensembl transcript 3430020 3430838 . - . ID=transcript:AAC76306;Parent=gene:b3281;biotype=protein_coding;external_name=aroE-1;logic_name=ena
Chromosome ensembl gene 3430843 3431415 . - . ID=gene:b3282;biotype=protein_coding;description=tRNA(ANN) t(6)A37 threonylcarbamoyladenosine modification protein%2C threonine-dependent ADP-forming ATPase;external_name=tsaC;logic_name=ena
biolinux@biolinux-VirtualBox[fromoffice] grep b3281 gene_exp.diff
gene:b3281 gene:b3281 - Chromosome:3429235-3430838 q1 q2 OK 24.697 33.6117 0.444627 0.672035 0.3566 0.504885 no
I have just parsed the "b numbers" and gene symbols out of the Ensmebl GFF file. Then I can get symbol:GO term mappings from topGO
Does this look even remotely reasonable?