Question

Run a GO analysis for an non-model organism with annotation file

0

Entering edit mode

6.2 years ago

madzayasodara • 0

I have a list of genes of my non-model organism for which to run a GO analysis:

RHOH
UCHL1
...

I have a gtf file that looks like (not so much info there):

scaffold_0      maker   exon    9496    9623    .       +       .       transcript_id "RHOH"; transcript_id_full "RHOH"
scaffold_0      StringTie       exon    11728   11971   .       +       .       transcript_id "RHOH"; transcript_id_full "RHOH"
scaffold_0      maker   exon    12077   12144   .       +       .       transcript_id "RHOH"; transcript_id_full "RHOH"
scaffold_0      StringTie       exon    20708   23579   .       +       .       transcript_id "RHOH"; transcript_id_full "RHOH"
scaffold_0      maker   exon    39534   40131   .       -       .       transcript_id "gene17"; transcript_id_full "gene17"
scaffold_0      maker   exon    43071   43701   .       +       .       transcript_id "gene1"; transcript_id_full "gene1"

...

I also have the fasta file with the gene sequences.

How should I go about running a GO analysis?

next-gen genome • 4.2k views

ADD COMMENT • link updated 5.4 years ago by rmash ▴ 20 • written 6.2 years ago by madzayasodara • 0

0

Entering edit mode

Can we assume that you also have (access to) the sequences? protein sequences corresponding to the gft for example.

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

Yes I have the sequences. But do I have to go from there? Why cant I use the GO terms for a close species based on my gene IDs? Thanks

ADD REPLY • link 6.2 years ago by madzayasodara • 0

0

Entering edit mode

I have the HGNC symbol, so I am able to grab their GO term. Is it wrong to get the GO term from humans and use that for next steps? I'm just confused because the genes have been annotated already. So, it's not like I'm coming with only sequences with arbitrary IDs.

ADD REPLY • link 6.2 years ago by madzayasodara • 0

1

Entering edit mode

Please use the 'Add comment' button to reply to a specific answer or add information to your question.

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k

1

Entering edit mode

You can use the human genes annotations if you know that these genes are orthologs of the genes in your organism of interest. If the human genes were identified by simple sequence similarity (e.g. best BLAST hit), I wouldn't rely too much on them if the evolutionary distance between human and your species of interest is big. The best way is to place the proteins of your species in a phylogenetic tree with GO-annotated species so that you can infer orthology relations.

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Hummmm I'll check with my friend who annotated the genome, but I think they are orthologs. 1 - If they are, which tools you think I should use to run the enrichment analysis? 2 - If they aren't I was thinking of just running AmiGo (http://amigo1.geneontology.org/cgi-bin/amigo/term_enrichment), because that doesn't seem to rely on knowing my GO terms before hand or selecting any specific species as background. What you say? Thxs

ADD REPLY • link 6.2 years ago by madzayasodara • 0

0

Entering edit mode

I am not sure I get the point about knowing the GO terms. Once you're satisfied with the mapping between your genes and the GO-annotated ones, you just used the GO-annotated genes as if they were from your species. As a second (i.e. background) list of genes, you probably don't want to use the whole genome of the other organism, but only the part that has orthologs that were interrogated in your experiment.

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

This might be the wrong thread to ask but how does one take eggnog mapping results (which provides a column with GO terms) and import it for topGO or something similar?

ADD REPLY • link 5.4 years ago by rmash ▴ 20

0

Entering edit mode

Correct. Post new questions as new threads. Please search biostars first (using google externally) to see if your question has been addressed before.

ADD REPLY • link 5.4 years ago by GenoMax 147k

0

Entering edit mode

Yes I believe this is the wrong place to ask. First you're creating an answer to a question without addressing that question (use comments to add to the discussion without answering a question, answers are for answers). Second you're asking a different question so you should create your own question.

EDIT: moved to a comment while I was typing.

ADD REPLY • link 5.4 years ago by Jean-Karim Heriche 27k

score 3 · Answer 1 · 2018-09-24

3

Entering edit mode

6.2 years ago

Jean-Karim Heriche 27k

The way to deal with GO analysis for organisms that have no direct GO annotations is to transfer GO annotations by orthology from organisms for which GO annotations are available.

ADD COMMENT • link 6.2 years ago by Jean-Karim Heriche 27k

score 1 · Answer 2 · 2018-09-24

1

Entering edit mode

6.2 years ago

EagleEye 7.6k

Check out this post C: Gene Set Clustering based on Functional annotation (GeneSCF)

ADD COMMENT • link 6.2 years ago by EagleEye 7.6k

score 1 · Answer 3 · 2018-09-25

1

Entering edit mode

6.2 years ago

lieven.sterck 15k

Since you have the protein sequences of your species of interest I would simply run Interpro2Go or blast2go on them. This will basically comes down to what other people have replied here (eg Jean-Karim Heriche ) but will avoid the tedious work of determining orthology relationships and will have higher specificity than grepping them from a close-by species. It is commanly accepted as a good (best?) approach to assign GO-terms to new species

ADD COMMENT • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

I would like to point out that commonly accepted doesn't mean it's good or even remotely close to best (for another example see this tweet). Best blast hits are used because they are quick and easy to obtain, not because this is the correct way of doing it. It may generate a good approximation but this depends on how closely related the species are and whether you care about the details of the assignments.

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Fully agree, but the tophat is not considered commonly accepted anymore ;) .

Anyway, your comment is somewhat true for the blast2go (blast based but not truly best hit) approach but not the interpro2go (totally not blast based) one I feel but they both are indeed similarity based.

I would be interested to see a non-sequence-similarity based approach to assign (transfer) GO terms.

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

But don't get me wrong: it's absolutely not so that I disapprove of your proposed approach. That will work and indeed provide (highly) reliable GO-terms

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

score 1 · Answer 4 · 2018-09-25

1

Entering edit mode

6.2 years ago

Bioaln ▴ 360

What I would try:

1.) blast against the database of candidate homologs with known GO terms

2.) Take top hits, transfer GO terms

3.) FET + e.g., bonferroni.

This answer is an extension of Jean-Karim's though.

ADD COMMENT • link 6.2 years ago by Bioaln ▴ 360