Dear Biostars,
I am working on a project to compare gene expression between four tree species A, B, C and D. They belong to the same genus, but the phylogeny suggests B and C are from the same clade. A and D are from two different clades. For species A there is a ref genome (ca. 90,000 scaffolds), but de-novo transcriptome assembly is needed for B, C and D. For all species, there are 2 treatments (control, treatment) with 3 biological replicates for each treatment.
I used Trinity to assembly B, C and D individually and mapping rates with Salmon is good (>95%) for all species. I am thinking whether I should try genome-guided assembly using the genome of species A. Given they are from different clades, do you think would it be a problem?
I would also like to assess orthology between transcripts from the four species. One way I can think about is to use edgeR (exact test) calling differentially expressed genes (DEGs) for each species individually and then use Orthofinder to find ortholog groups for all-species DEGs comparison. The other is to cluster all transcripts/genes (from all species) and bringing them to DEG analysis altogether although I do not know how difficult the cluster of 600,000+ transcripts generated from Trinity will be and how complicated the analysis will be in R. At this point, I am leaning toward the first option, but I am inexperience in this kind of analysis. Could you give me some directions on how to implement this appropriately, please?
Thanks and looking forward to hearing your suggestions.
Thank you very much for your comment. :) Just to be sure, I should do (1) DGE analysis for each species individually, (2) combine DGEs from each species together and run Othofinder and (3) matching such DGEs with the expressions from (1). Am I correct?
Also do you have any other suggestions on what I should be aware of when dealing with comparison of transcriptomes from multiple species?
not quite I think.
1) yes run DGE analysis in each species. 2) run orthoFinder using all genes from all species (== will give you groups of genes that are homologous/orthologous/paralogous ) 3) use the orthofinder result to link genes from species A to B to C ... 4) using info from step 3 you can assign DGE to the different clusters.
Critical step here is to be able to correctly assign genes as orthologs from each other. As gene IDs from the different species will differ you will need to link them based on sequence content (== the orthofinder step)
I think I get it now. Thanks so much .. :)
A small educational note: if an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. (and you can accept multiple answers if need-be) . Please go through your previous posts as well and resolve them adequately (if applicable) . thx