Hi,
I have multiple sample RNA-seq data. I have the gene annotations gff file. I want to find gene ontology terms for the genes. Can you please suggest a way.
Thank you!
Hi,
I have multiple sample RNA-seq data. I have the gene annotations gff file. I want to find gene ontology terms for the genes. Can you please suggest a way.
Thank you!
It's unclear where you are in the processing pipeline.
First you need to align and get feature counts (ie: number of reads per gene) from your RNA-seq. Next you perform differential expression analysis using something like limma or DESeq2. Once you've done that, you can take the genes differentially expressed in your condition of interest and plug them into a number of web-based GO enrichment tools (I like pantherdb, but metascape and genecodis are also nice).
There's a box on the homepage where you enter IDs (usually 1 per line). Your best best is something like ENSEMBL or Entrez IDs, but it will accept gene symbols/names as well (but they might map to something you don't expect, so be careful).
Let's say your list is 100 genes. The site will then find GO terms that are over represented in your gene list. So, if (on average) 1 in 100 human genes relate to inflammation, but in your gene list there are 20 genes related to inflammation, you have a 10-fold enrichment over the background (it's actually more complicated, but this is the general concept).
Thank you, I have list of 1500 genes. These came from a different genome for the same species whose reference genome is included in PANTHER. So I changed the gene id's based on reference genome available. I uploaded the gene list here http://www.pantherdb.org and chose file type (ID list), organism and Functional classification viewed in gene list. Then I got a list of results. I do not see a GO term column in the result. Can you please help me interpret the result.
Thank you, I uploaded a plain text file with gene list. I selected reference list as available in the database. I chose GO biological process complete. It gave result and shows 250 genes are not mapped. But the number of genes from each category does not represent the actual total of genes in the list I uploaded. I also want to report gene ontology terms for individual genes. How can I do that?
You would not expect the total number of genes in each category to equal the total number of genes input. This is because genes can map to multiple GO terms (GO terms are highly redundant). You can explore all of the parent-child terms related to a GO term of interest at http://amigo.geneontology.org/amigo
A gene that is related to "defense response to virus" is also going to pop up in "immune system process" and "response to other biological organism" and many others.
If you want to see all the GO terms associated with a particular gene, you can enter it into "Search" box at the top left of the pantherdb website. Once you're on the gene page, there's a section ("Gene ontology database annotations") you can expand that shows you the (many) linked GO terms.
Here's the page for Interferon Beta as an example: http://www.pantherdb.org/genes/gene.do?acc=HUMAN%7CHGNC%3D5434%7CUniProtKB%3DP01574
If you want to do this for a lot of genes your best bet is to learn how to use one of the R packages that interacts with a GO database.
I haven't used this one, but it looks promising.
If you don't know anything about R, it's not that difficult. Download R and R studio, then start out learning using the Swirl package. It's very user friendly.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It helps to do a google search to find prior threads for common requests such as this (add
site:biostars.org
after your keywords to limit your search to Biostars, in this casegene ontology analysis
). You will find multiple results.