Hi,
I have gene ontology data that looks like this (sample data). I manually downloaded the gene ontology data from the Uniprot website.
I want to:
- Make a list of unique GO terms from the column 'Gene Ontology(molecular function)
- Make the list of genes associated with each GO term
- Also make the list of GO term descriptions in the next column
- Make the list of number of gene associated to a GO term
The output data would look like this
Please note that GO terms do not repeat in a specific cell, however, they repeat in other rows. Kindly help me to do this in R.
I did several unsuccessful trials. Hope to get help from someone. I will highly appreciate your help. Thank you so much.
df[!duplicated(df$'Gene Ontology (molecular function)'),]
will work, and it will only keep the first entry of the duplicated values.group by GO IDs, collapse gene_id column, expand GO IDs, group by GO IDs, and count genes.
See this SO post as a start: