Hi all,
I have to analyze 24 transcriptomes (TRM) in order to compare gene expression in different conditions of an animal, which genome has not been annotated. I thought about a multiple assembly followed by a co-assembly in order to build the "main" TRM. After quality filtering, I was thinking to:
- Assemble the 24 libraries (they came from different treatments) using X different assemblers (i.e Trinity, velvet, ../ multiple K-mer, etc.). This will give me 'X' x 24 assemblies.
- Merge together 'X' x 24 assemblies with a co-assembly tool (i.e. CD-HIT-EST or CORSET or CAP3). Therefore, I will end up having one main transcriptome (Main-TRM), representing the animal object of the study.
- Performing functional annotation using the Main-TRM against SWISS-Prot, KEGG, GO ,etc. using blastx and blast-to-go.
- Tacking the non-assembled quality filtered reads from the 24 libraries (before step 1, in order to retain the condition variable) and blast them individually against the annotated Main-TRM, having this way the expression information.
What do you guys think about this approach? Is it theoretically correct, if not what should I change?
Thanks a lot in advance,
~Giorgio
Thanks Reema for your suggestions. Yes I was planning on diversify Trinity throughput using different parameters.
Also, this should have been a comment, not an "answer". I don't think I can move it for you, just saying.
Hi Madelaine,
Thanks for your input. Yes I was thinking for step 4. to use BOWTIE and using the output (.bam) file to estimate transcription level abundance for each library using RSEM (RSEM: accurate transcript quantification from RNA-Seq data). I just have to figure out the best way to combine the 24 results for statistical purposes.
Here, I would prefer Trinity for performing the assembly as it is de novo assembly. There is no need to use different assembler as it will consume your time and efforts. But, still you are curious to compare the output of different assembler then you can go ahead and can compare the diagnostics among different assembly.
It's not really about comparison, it's more about having a co-assembly derived from multiple assemblies (i.e. using Trinity but with different k-mer). Several papers suggest a co-assembly step after generating different assemblies with more tools or within the same tool but having used different parameters (e.g. k-mer). hope that makes sense.
I think trinity have fixed k-mer size (25) and this is optimal across different transcriptomes as per trinity developer.
Hi, is there any update on this ? Have you reached annotation part ? I am having similar kind of data and would like to know if you have some summary on this.
Hi, not yet I'm waiting for other data, will update when I finish some of the initial steps.
Hi. I need to do something similar. Let me know how your analysis goes. In my case I have draft genome and very few annotations.