Hello all,
I'm currently trying to run SoapDenovo-Trans and compare the results of my transcriptome assemblies with this software to Trinity. I don't have a reference genome, and my goal is to identify the peptide sequences of a protein family I'm interested in in non-model organisms.
In Trinity, you have the option to list biological replicates to run them all in one assembly. In this way, you capture all sequences expressed between individuals and can compare expression levels between them with the common de novo transcriptome. I would like to do the same thing for SoapDenovo-Trans. I have 150bp PE stranded fastq files.
In the manual, it says to begin the config file with [LIB] and then include the reads as q1 and q2, but it doesn't give an example for including multiple replicates.
Should I try concatenating all of my fastq files from the replicates (R1.1, R1.2 and R1.3, etc.) or can I include multiple libraries by having something as follows:
> #maximal read length
max_rd_len=150
[LIB]
#maximal read length in this lib
rd_len_cutof=150
#average insert size
avg_ins=300
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
map_len=32
#fastq file for read 1
**q1=Species1_rep1_R1.fastq**
#fastq file for read 2 always follows fastq file for read 1
**q2=Species1_rep1_R2.fastq**
[LIB]
#maximal read length in this lib
rd_len_cutof=150
#average insert size
avg_ins=300
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
map_len=32
#fastq file for read 1
**q1=Species1_rep2_R1.fastq**
#fastq file for read 2 always follows fastq file for read 1
**q2=Species1_rep2_R2.fastq**
[LIB]
#maximal read length in this lib
rd_len_cutof=150
#average insert size
avg_ins=300
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
map_len=32
#fastq file for read 1
**q1=Species1_rep3_R1.fastq**
#fastq file for read 2 always follows fastq file for read 1
**q2=Species1_rep3_R2.fastq**
I tried this, and I honestly cannot tell if the third library just overwrote the previous ones, because the code to run soap just includes one option for the output header (so there was one scafseq file that said "Species1.scafseq"). This was larger than when I ran Rep1 alone, but that could be because the output was just larger for Rep3.
Alternatively, I could run each rep individually and combine the fasta files, But I would prefer if there were assembled together.
Thanks!
did you got the solution?