Hi all,
I have obtained two illumina MiSeq 2x75 paired-end read files, one forward and one reverse.
oenopla-reads1.fastq & oenopla-reads2.fastq
Then I performed genome assembly using Velvet. I found out kmer of 67 produces the best N50 and maximum length.
Since I do not know the insert length, I declared -ins_length as auto. Here's my command:
velveth Velvet_67 67 -shortPaired -fastq -separate oenopla-reads1.fastq oenopla-reads2.fastq
velvetg Velvet_67 -ins_length auto -exp_cov auto -cov_cutoff auto
I wanted to see the ins_length (max and min), however when I check the log file, I only saw this:
Median coverage depth = 15.632530
Final graph has 3045 nodes and n50 of 165, max 5386, total 165527, using 769537/15947236 reads
I need the range to run another assembly program called Metassembler, it requires the max insert length and min insert length. My question is: How can I find out the insert length from Velvet?
Your help is greatly appreciated.
Thanks!
1) I ran a perl script from the velvet developer to find the insert size http://github.com/dzerbino/velvet/blob/master/contrib/observed-insert-length.pl/observed-insert-length.pl
Outcome:
2) I ran bbmap to actually map the short reads to the Velvet contigs to calculate the insert size.
Outcome:
I got two different median insert size, 352 and 193. Which number should I pick? How to decide the max and min?
In this case the vast majority of your reads did not map as proper pairs. This probably indicates your assembly was fairly discontiguous. Since only 9% of the reads mapped as proper pairs, you cannot trust that the insert size data from BBMap accurately reflects the library as a whole since it only covers 9% of the reads (the rest are likely to be longer, when the assembly has low contiguity). I suggest using Velvet's estimate.