Hello everyone,
I have 44 pairs of fastq files which I want to assembly using Trinity. I want to use the trinity output fasta file to map the contigs back to each fastq files to get the expression profiles (using bowtie) so I can do the differential analysis. The reason I want to do such a trinity assembly is because we don't have a good reference genome.
However I am facing a question: assemble these 44 pairs of fastq files in one trinity job maybe really resource challenging. My cluster doesn't have enough space for the temporary files generated during this process. So I am wondering if there is any alternative approach I can do?
Can I assemble each pair first and then assemble the 44 trinity output fasta? Would these two be identical? Please let me know. Thank you very much!
Assembling each pair first will give you problems in downstream analysis with DE when you map your reads back with RSEM, which uses bowtie. Try using the in silico read normalization parameter. It should cut down on the number of reads you're using, by normalizing given a cutoff for sequencing depth, and since you have 44 samples, you have more than enough coverage. Do you have multiple treatments? Why so many samples?
You could also run trinity in steps. Stop after each step (there's parameters in the 'show full usage' tag), and remove interemediate files.
Thank you! I understand that there would be a problem to map it back using RSEM if I did separate assembly for each pair since the contigs would be labeled with different IDs.
The reason I have so many samples is because they are from different body sites, different time points and different animals. So yes something similar to multiple treatments.
I am not sure how to use silico parameter. Could you please provide more info or a protocol so I can do some deep research. Thank you very much!
In your trinity directory, type
./Trinity --show_full_usage_info
. It will give you a full set of parameters. Are you interested in DE genes between different animals? Or different body sites/time points within the different animals separately. If the former, I would combine all your read1's, and combine all your read 2's, and run Trinity, using the in silico parameter. Simply provide the flag, and it will run and reduce your total number of reads. If the latter, combine all the read 1's and read2 's for each animal, and then run Trinity/RSEM/DE analysis on each assembly individually. In the end you'll get a heatmap per animal, given the different body sites/time points.