I'm a relative beginner in bioinformatics but have quite some experience with downstream analysis of scRNAseq data. Now I'm trying to do some gene ontology and gene set enrichment analysis, but I'm getting lost in the forest of gene annotation and ontology tools and what they all mean. This stuff is rarely covered in a scRNAseq analysis course and I've been struggling to find good introductory learning materials.
Basically there are two tasks I'd like to do, and I'm looking for the right tools, preferably in python:
- Given a list of gene IDs from ENSEMBL, I'd like to obtain the gene name abbreviations and gene ontology descriptions. I know biomaRt can do this and also saw an implementation using AnnotationHub() in R, but I can't find a good introduction to how to use these tools and think I'm missing some background knowledge to understand how these work. Also, is there a python tool that can do this type of labelling?
- From a gene list annotated with gene ontology, I'd like to do gene set enrichment analysis. For this I found gseapy to be useful and think I can make my way through. However, if you have other suggestions please let me know.
So my problem is more on the database side, I don't understand the many different gene databases, how to set up a query, what the difference between tools like biomaRt and AnnotationHub are and which one to use. Where do I find a good introduction to this?