I'm currently analyzing RNA-seq data from four species in one genus, and I would love a little help with deciding my next steps.
My eventual goal: Finding secreted proteins/secondary metabolites expressed significantly among 4 species of fungus in culture: either expressed in one species only, or co-expressed in all 4. (This is a discovery-based project, there's no null hypothesis)
Starting data: I started with RNAseq reads, assembled genomes, a .gtf annotation for each genome, and functional annotation information (swissprot, signalp, PFAM, etc) for each genome. The functional annotation files hold protein_ids and corresponding descriptions.
What I have done so far: I've aligned the reads from each species to their respective genomes (including the .gtf annotations in order to keep gene_ids constant) using Hisat2, and assembled transcripts and quantified expression using Stringtie.
What I have now: 1 Stringtie output for each species, each with aligned gene_id, transcript_id, and FPKM/TPM values.
The advice I need: What should be my next step? Since I'm not looking for differential expression, I'm assuming that my next analyses should be on individual species. How can I associate my protein_ids and my gene_ids? How can I go from FPKM values to deciding whether or not a gene is significantly expressed in a species? Are FPKM values enough, or is there some kind of normalization that should still be done (log transformation)? Should gene clusters be found, and how would that be important? Once I find (for example) a gene that produces an interesting secondary metabolite, how would I find if there are analogs in the other species?
I'm feeling a little lost when it comes to what to do next.