Help with R code to get a unique list
1
0
Entering edit mode
3.0 years ago
Space_Life ▴ 50

Hi,

I have gene ontology data that looks like this (sample data). I manually downloaded the gene ontology data from the Uniprot website.

Data view

I want to:

  • Make a list of unique GO terms from the column 'Gene Ontology(molecular function)
  • Make the list of genes associated with each GO term
  • Also make the list of GO term descriptions in the next column
  • Make the list of number of gene associated to a GO term

The output data would look like this

Output data

Please note that GO terms do not repeat in a specific cell, however, they repeat in other rows. Kindly help me to do this in R.

I did several unsuccessful trials. Hope to get help from someone. I will highly appreciate your help. Thank you so much.

Gene-ontology R • 2.1k views
ADD COMMENT
1
Entering edit mode

df[!duplicated(df$'Gene Ontology (molecular function)'),] will work, and it will only keep the first entry of the duplicated values.

ADD REPLY
1
Entering edit mode

group by GO IDs, collapse gene_id column, expand GO IDs, group by GO IDs, and count genes.

ADD REPLY
0
Entering edit mode
ADD REPLY
3
Entering edit mode
3.0 years ago

Tidyverse answer

library("tidyverse")

df %>%
  select(gene_id, `Gene ontology (molecular function)`) %>%
  distinct %>%
  group_by(`Gene ontology (molecular function)`) %>%
  summarize(gene_id=str_c(gene_id, collapse=", "), `No. of Genes`=n(), .groups="drop") %>%
  separate_rows(`Gene ontology (molecular function)`, sep="; ") %>%
  separate(
    `Gene ontology (molecular function)`, sep="\\s(?=\\[GO)",
    into=c("Gene ontology (molecular function)", "Gene Ontology IDs"))
ADD COMMENT
1
Entering edit mode

I'd strongly suggest making gene_id a list-column of vectors though. Else this ain't tidy.

ADD REPLY
0
Entering edit mode

Thank you so much for this code. I tried it. It works wonderfully. Some of the GO terms are repeating (present in multiple rows_ in the output table (GO:0003735,GO:0019843). Also, can we add corresponding UniprotKBs in one more column? Thank you again.

ADD REPLY
0
Entering edit mode

Also, I tried exporting it as CSV. It gives out the same file. How do you export the updated dataset as CSV? Thank you again for your kind help.

ADD REPLY
1
Entering edit mode

You need to save the results to a variable, such as changing the first line to df2 <- df %>%, and then you can save df2 to a csv.

ADD REPLY
0
Entering edit mode

Thank you so much. It worked.

ADD REPLY

Login before adding your answer.

Traffic: 1793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6