Entering edit mode
3.8 years ago
biosjm
•
0
Hi all, I obtained after DEG analysis with deseq2 and Go enrichment analysis (fisher,classic) with TopGo a long list of enriched terms. They are from several levels and too many (500) to explain all of them. I´m not sure how I should proceed... Is there a slimfunction I can use after using TopGO?
There are two ways of reducing candidates. The first is to use the
lfc
argument inDESeq2::results
to test against a certain threshold so only genes above this have the chance to be significant. With this you can exclude significant genes with tiny effect sizes. The second one would be to use something more meaningful or specific as GO terms such as KEGG or REACTOME pathways. GO terms are broad and unspecific. Something lime "signaling" in a GO term is very unspecific, and for more precise results you could use tools like gprofiler2 to actually test for pathways rather than just terms.This depends on the depth of the branch. Some GO terms can be very specific. Working with GO one would ideally need to take into account the graph structure of the ontology. Pathways are easier to work with because they are defined as lists of genes. The downside is that pathways with the same name are defined differently in different resources (e.g. KEGG dna damage has 124 genes, Reactome dna damage has 314 proteins). Also even Reactome pathways are connected in a graph so you also need to decide on which level you want to operate.
Thanks, very helpful! Can you recommend a package I can use for KEGG analysis/enrichment? I used TopGO for GO enrichment analysis, but could not find something similar for KEGG. I have already a custom annotation table (de novo transcriptome assembly- containing among others informations of KEGG/KO ID´s for every gene).
Have a look at the clusterProfiler Bioconductor package but you can also easily compute statistics yourself (e.g. overrepresentation with the hypergeometric test or gene set enrichment analysis with the fgsea package).
EDIT: Here is a tutorial that you may find helpful.
Yes, the problem with clusterprofiler is, that you have to use search_kegg_organism, so you have to search for a organism. Due to the fact, that I have a assembly, I have KEGG numbers from several organism.
I think you can use your own annotations with clusterProfiler. See chapter 3 of the clusterProfiler manual.
Thank you for your answer, ATpoint. Yes, I have set already the threshold for log2foldchange to 1. I have to say, that I have a non model organism and have a custom annotation table containing also KEGG and KO Ids. I made the experience, that many tools/packages are only available for model organism.
Please add comments via
ADD REPLY/COMMENT
.Maybe you can use the homolog gene names from a closely-related model organism for the pathway analysis?
Some tools allow you to use custom annotation files if you have them (see for example the topGO documentation section on custom annotations). A common approach to generate custom annotations is to transfer annotations by orthology from related organisms.
Also, if you are specifically interested in GO terms and have too many, there are some tools which will reduce and summarize redundant lists of the terms, such as REViGO.