Hi, I want to do GSEA analysis in R on significantly differentially expressed genes on nonmodel species (five in total).
My research is based on cross-species comparative transcriptomics. And this is what I am doing:
- I already have species-specific: de novo assemblies, annotations (across 7 different databases), quantification (read counts), CDS predictions...
- Next, I did transdecoder and selected the longest ORFs, and used these peptide sequences to detect my single copy orthologues across species with Orthofinder
- I assigned gene length and gene counts to create a Gene expression matrix.
Now I am planning to do differential expression analysis on my orthologues (still learning what is the best approach since I have to do some kind of normalization to account for different species/transcriptomes). I guess this is another topic...
Let's say I have my DEG list and I want to do GSEA. I learned how to do that in R for human RNAseq data and one step is loading the human database (https://www.gsea-msigdb.org/gsea/msigdb). My question is what should I do when I have nonmodel species, how do I make my databse?
Can I make it from my annotations list, if yes, how do I do that?
p.s. I am working on isopods with no reference genomes. :)
Thanks!
Lada
Yes, it is useful for sure! thank you so much! It's important for me to know that I can a) do that for nonmodels and b) have some starting steps to create my own database. I am still learning to work in Linux and R environments, so I might call for help again if I'm stuck with making the annotation package.