Just starting to learn R so bear with me. We've conducted an RNA-seq experiment and are working on getting gene ontology information on the sequenced genes. I've gotten the biomart R package installed and have been able to successfully request GO terms given a list of ensembl gene IDs. The issue is that I'd like the results to be placed in a file listing each gene ID in a column and then the GO terms in rows corresponding to each gene they describe. For example:
Gene ID: GO term:
AGAP000002 Proteolysis, oxidation-reduction, protein-binding, etc
AGAP006543 chitin biosynthesis, intracellular, etc
When I just export the biomart results, it lists each GO number separately and doesn't show which gene they correspond to. Is there an easy way to get this exported correctly? If I need to include anymore information to assist you in answering, just let me know. Thanks.
I think this is on the right track but I can't quite get it to work. I used the biomaRt package to retrieve the GO terms:
This worked fine and it resulted in this format:
Since there are multiple GO terms for each gene ID, I'd like to group gene IDs together and so there's only one row for each gene ID with the corresponding GO terms place in a single cell. So the above result would then be:
How should I go about doing this?
If you are using a mart that holds the go names in the db as well as the accessions, eg the plant mart, and additionally want to output the accession numbers concatenated (not included in the original question) you can skip using the GO.db package, and modify the second part (aggregation) as follows
Note that dplyr outputs results using a modified data.frame object that hides variables not fitting on the screen (your long concatenated strings). If you export it to a textfile, and it'll all be there. If you want to view all of it, you can cast it to a regular data.frame object by adding an additional
%>% as.data.frame()
at the end of the pipe.I'll modify my answer above to include this solution.