I am trying to use VelvetOptimiser to do denovo assembly of a plant. I have 2 libraries; 100PE with insert size ~ 400bp and 150PE mate pair library with 12-18k insert size. I have several questions:
A) I tried to do initial assembly with only 1 library (the 100PE with insert size 440bp) like this:
VelvetOptimiser.pl -s 35 -e 47 -f '-fastq -shortPaired pe_merged -fastq -short s_se' -a -o '-unused_reads yes' -t 1
The results were good (Final graph has 731536 nodes and n50 of 5769, max 55415, total 149725954, using 51729078/69351035 reads). The VelvetOptimiser identified K=41 and set the Expected coverage to 10.
First question: VelvetOptimiser took 6 iterations to declare the Optimum value of coverage cutoff = 1.89, however it ran the final velvetg with (-cov_cutoff 1.44)! Any one can explain this? should I re-run velvetg with the optimum coverage cutoff?
Second question: The Paired Library insert stats gave me 2 lines:
Paired-end library 1 has length: 438, sample standard deviation: 19
Paired-end library 1 has length: 441, sample standard deviation: 36
Why this is coming out as two different libraries?
Third question: The final output has runs of "N's". According to Quast which is another software to assess the assembly, #N's per 100 kbp=10271. I did not use the "-scaffolding" option and I do not see theVelvetOptimiserpassing this to the velevtg. Can anyone explain why the velvet is generating this N's?
B) Then I tried to combine the 2 libraries like this:
VelvetOptimiser.pl -s 31 -e 47 -f '-fastq -shortPaired pe_merged -fastq -shortPaired2 mp_merged -fastq -short s_se_combined' -a -o '-unused_reads yes' -t 1
The results came out VERY bad. The expected coverage even went down to 9. Here is the results of the final itaration:
Velvet hash value: 47 Roadmap file size: 7263105695 Total number of contigs: 387128 n50: 811 length of longest contig: 17283 Total bases in contigs: 158410389 Number of contigs > 1k: 36064 Total bases in contigs > 1k: 68344726 Paired Library insert stats: Paired-end library 1 has length: 437, sample standard deviation: 21 Paired-end library 2 has length: 211, sample standard deviation: 105 Paired-end library 1 has length: 440, sample standard deviation: 57 Paired-end library 2 has length: 220, sample standard deviation: 138 Paired-end library 1 has length: 440, sample standard deviation: 62 Paired-end library 2 has length: 221, sample standard deviation: 149
So my Fourth question is how to do the genome assembly Using 2 Insert Size Libraries? Is it better to finish assembly from the 1st stage and use another software (SSPACE) for integration of mate pair sequences?
Thank you
Thank you @SES
I have some clarifications on my questions: