I currently have three non-model fish species that need to run RNA-seq analysis, which means I can't find the reference genome for these species. I only have experience running RNA-seq for species that have a reference genome, and my workflow is fastqc-trimmomatic-hisat2-featurecounts-edgeR/Deseq2-DAVID functional annotation tools.
I would also like to run differential gene expression and pathway analysis for these non-model species. I'm wondering if there are any resources I can take a look at or recommended workflow for running analysis for non-model species.
One quick question. Since I have multiple treatment groups and replicates for the same non-model species. Do I need to run Trinity for each of them or there is a way to pool them together since they come from the same species?
For the other assemblers (if you choose to use them) you can typically just specify all reads for all conditions/replicates for the same species on the command line, e.g.:
There are a couple of options I've used in similar situations.
Use an existing transcriptome of a well annotated species. Zebrafish, for example, has a lot of resources available. However, the larger the evolutionary distance, the more spurious your results can become.
If you have good enough RNAseq data, you can try using stringtie in de novo mode (some docs here), which you can then use as input for transcriptome. However, there will likely need to be some strict quality thresholds for the annotations. I tend to find de novo annotation pipelines are quite noisy.
Otherwise, the analysis would be the same as with other species. Downstream analyses like GO and KEGG are a little harder, but if you have the time you can annotate the transcriptome yourself. I've also seen the use of orthologs of a better annotated species for these kinds of downstream analyses in the literature.
Thanks for answering! It is a really clear workflow and I will try to follow it.
One quick question. Since I have multiple treatment groups and replicates for the same non-model species. Do I need to run Trinity for each of them or there is a way to pool them together since they come from the same species?
Have a look at the
--samples_file
option for Trinity: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Running-Trinity#typical-trinity-command-lineFor the other assemblers (if you choose to use them) you can typically just specify all reads for all conditions/replicates for the same species on the command line, e.g.:
Thanks you for the information!
If I or anyone else have given you helpful advice, please upvote. If we've answered your question, please mark as accepted.