I have a list of genes of my non-model organism for which to run a GO analysis:
RHOH
UCHL1
...
I have a gtf file that looks like (not so much info there):
scaffold_0 maker exon 9496 9623 . + . transcript_id "RHOH"; transcript_id_full "RHOH"
scaffold_0 StringTie exon 11728 11971 . + . transcript_id "RHOH"; transcript_id_full "RHOH"
scaffold_0 maker exon 12077 12144 . + . transcript_id "RHOH"; transcript_id_full "RHOH"
scaffold_0 StringTie exon 20708 23579 . + . transcript_id "RHOH"; transcript_id_full "RHOH"
scaffold_0 maker exon 39534 40131 . - . transcript_id "gene17"; transcript_id_full "gene17"
scaffold_0 maker exon 43071 43701 . + . transcript_id "gene1"; transcript_id_full "gene1"
...
I also have the fasta file with the gene sequences.
How should I go about running a GO analysis?
Can we assume that you also have (access to) the sequences? protein sequences corresponding to the gft for example.
Yes I have the sequences. But do I have to go from there? Why cant I use the GO terms for a close species based on my gene IDs? Thanks
I have the HGNC symbol, so I am able to grab their GO term. Is it wrong to get the GO term from humans and use that for next steps? I'm just confused because the genes have been annotated already. So, it's not like I'm coming with only sequences with arbitrary IDs.
Please use the 'Add comment' button to reply to a specific answer or add information to your question.
You can use the human genes annotations if you know that these genes are orthologs of the genes in your organism of interest. If the human genes were identified by simple sequence similarity (e.g. best BLAST hit), I wouldn't rely too much on them if the evolutionary distance between human and your species of interest is big. The best way is to place the proteins of your species in a phylogenetic tree with GO-annotated species so that you can infer orthology relations.
Hummmm I'll check with my friend who annotated the genome, but I think they are orthologs. 1 - If they are, which tools you think I should use to run the enrichment analysis? 2 - If they aren't I was thinking of just running AmiGo (http://amigo1.geneontology.org/cgi-bin/amigo/term_enrichment), because that doesn't seem to rely on knowing my GO terms before hand or selecting any specific species as background. What you say? Thxs
I am not sure I get the point about knowing the GO terms. Once you're satisfied with the mapping between your genes and the GO-annotated ones, you just used the GO-annotated genes as if they were from your species. As a second (i.e. background) list of genes, you probably don't want to use the whole genome of the other organism, but only the part that has orthologs that were interrogated in your experiment.
This might be the wrong thread to ask but how does one take eggnog mapping results (which provides a column with GO terms) and import it for topGO or something similar?
Correct. Post new questions as new threads. Please search biostars first (using google externally) to see if your question has been addressed before.
Yes I believe this is the wrong place to ask. First you're creating an answer to a question without addressing that question (use comments to add to the discussion without answering a question, answers are for answers). Second you're asking a different question so you should create your own question.
EDIT: moved to a comment while I was typing.