I want to assemble a transcriptome using Trinity of a plant lacking of reference genome for running a RNA-Seq experiment afterwards
I have control plants and plants infected with a fungus for 2, 7 and 15 days. And for all these 4 conditions, I have plenty of Illumina paired reads of nice and good quality per separate.
In the way of getting the transcriptome, I have two possibilities - the opportunity of assembling control and infected plants for separate obtaining a total of 4 transcriptomes. - Or I can concatenate and join all of the reads in a common file and get a common transcriptome
These two possibilities are full of subtle considerations, and I just want to learn from your experiences.
Do you have the genome sequence of the fungus (or is it fungi as in multiple) available? If you do then you could bin the reads for the fungus(fungi) away from the plant and then do assembly on the pool of plant reads.
What is your end goal? Do you want to study the differentially expressed genes between your 4 conditions? If so, Trinity recommends building a consensus assembly, using all samples as inputs.
This fully answers my question
However, the more sequences you have isolated under different conditions, the harder have to be for any transcriptomic assembler to assemble, as you can mix different isoforms, complicate the assembling with more sequences, etc
The DE pipeline for Trinity provides a RSEM perl wrapper that aligns the raw reads from each sample to the assembly. From this you will get both isoform and gene counts matrices which will tell you how many reads went into into that transcript from that sample. You can use these matrices for further downstream DE analysis in the pipeline.
Just a head's up - if you are looking for paralogs, it becomes difficult in case you mix the 4 samples. But if it's a more general analysis, pooling would give better assemblies