I am working with several E. coli samples, for each of which I have a list of hundreds of differentially expressed genes deriving from DESeq2. Besides log 2 fold change and p-value, I also have their expression in TPM and FPKM. I also have the annotated GO terms for each gene.
With all these data I would like to perform a gene set enrichment analysis, once for each sample. Hence, I would need something that works from command line and that accepts E. coli gene names. I would much prefer if the expression levels could be used to rank genes.
The tools I know are:
- GSEA
- DAVID
- GREAT
- Enrichr
- GOrilla
- SetRank
I am basically a neophyte with all these tools. I know that some of them are mostly made for human or mice datasets, and some other (DAVID) are debatable in terms of reliability.
All the posts I found on BioStars about this are either not centred on E. coli or quite old (4-5 years ago). Hence, what way would you suggest me to proceed?
Based on the tools you listed, it sounds like you are looking for a web-based tool that does not require any coding? You can check a few additional options here. Most of those are going to only support specific species.
If you are comfortable with some coding, there are many R-based tools that are species-agnostic and will allow you to use arbitrary gene sets (pathways) as input. In that case, you actually have at least two separate questions: "where can I find E. coli gene sets" and "how do I input them into a particular tool".
Quite the opposite: I'm an experienced programmer in R/python/bash and I would like as much CLI as possible. In the second paragraph of my question I write "I would need something that works from command line" because I am not really a web server type of guy (and I have many samples to process) :)
Thanks for clarifying! If you are looking for an R-based tool, clusterProfiler is a good place to start. It does a few different types of analysis and includes many plotting options.
Isn't clusterProfiler mostly for mice and human? As far as I remember from their manual.
The manual has human and mouse examples. However, it can use any gene sets as input.