I'm de novo assembling a transcriptome. I have RNA-seq data of treatment and control group with two time points. There are three replicates for each group. When doing the assembly, shall I pool all reads (from both control and treatment groups) to assemble or just use each replicate to do the assembly? Is it ok to pool them together and if assembling for each replicate what I shall do to make it comparable between different groups and differenet timepoints? Thank you.
Treat each replicate separately. Diff expression requires replicates. Never Pool, cringe.
Yeah, I know when doing the differential expression analysis it should be separated. But when I assemble the transcriptome, no matter treatment or control or different replicates they should have similar genes or transcripts, right? So can I pool them to do the assembly and map each replicate back to do diff expression?
Just to be clear, everything goes into the one assembly. You should just have one assembly. counts use your individual reads then form a matrix using Trinity pipeline.
Got it. Really appreciate it. I'm new to assemblies and Thanks for your patient explanation.
Not necessarily...
Your treatment and control will presumably differ when comapred
Within replicate groups, you may have one replicate which is an outlier, when pulled, how do you determine the rotten egg in the basket?
You may also have one read with contamination etc.
For transcriptome assembly, concatenate your reads in order (keeping the same order for both forward and reverse reads). Remember the transcriptome is an assembly of everything, so by feeding 1 concatenated left read and 1 concatenated right read (presuming you have read pairs) made up of all reads, that's fine. Check the Trinity github page out for some help.It's great if you're new to assemblies, plus it's very beginner friendly.
For abundance counting in RSEM for example, you will provide each set of reads as individual replicates. Pulling in RNA-Seq is a bit cringe-worthy and defeats the purpose. Having an idea of variability is also key. You will also limit yourself to downstream analysis if you pull - being stuck with one replicate. Statistics works off replicates.