Hello,
I have tons of data (about 1Tb) of RNA-Seq coming from different technologies (i.e. 454 - single end, Illumina both single and paired end) of the same non-genome annotated species from different tissues/conditions. So far for the illumina data I did several assemblies with Trinity, specifically one for every single distinct run and also a huge one that derived from all the illumina reads merged together (the latter took about 6TBs of storage during writing of intermediate files!).
Everything finished with no problem.
I have also a smaller quota of 454 data that I would like to integrate into the illumina data. My question is what would you suggest to do?
I thought of:
- Merging all the 454 data in one .fasta file and run Trinity.
- After that merging all the Trinity.fasta files obtained from the 454, the illumina single experiments and the illumina merged one (the huge) and run again trinity (with maybe normalization).
- Doing some downstream like cd-hit-est or cap3 to remove redundancy.
What is your suggestion?
Thanks in advance,
~Giorgio