Hi, I am working on an RNA-seq analysis of a wild type and a gamma-irradiated mutant of a non-model organism. The aim is to identify differentially expressed genes between them. So, what I have done is generating 2 separate de novo assemblies, identifying common sequences between the 2 with at least 90% identity using cd-hit-est-2d, and using it as a mapping reference to do DE.
My question is whether my current workflow is fine to be continued or there is any generally accepted workflow to apply in my case? What do you think? Thanks.
Comparing expression when you map different samples to different assemblies is just too hard, so I'd second making a single coassembly. But an alternative approach to assembling all the reads together would be:
That might reduce the number of spurious assembled sequences that are just due to polymorphisms between the samples.