Question

Velvet - ins_length auto

0

Entering edit mode

7.9 years ago

Kenny ▴ 30

Hi all,

I have obtained two illumina MiSeq 2x75 paired-end read files, one forward and one reverse.

oenopla-reads1.fastq & oenopla-reads2.fastq

Then I performed genome assembly using Velvet. I found out kmer of 67 produces the best N50 and maximum length.

Since I do not know the insert length, I declared -ins_length as auto. Here's my command:

velveth Velvet_67 67 -shortPaired -fastq -separate oenopla-reads1.fastq oenopla-reads2.fastq
velvetg Velvet_67 -ins_length auto -exp_cov auto -cov_cutoff auto

I wanted to see the ins_length (max and min), however when I check the log file, I only saw this:

Median coverage depth = 15.632530
Final graph has 3045 nodes and n50 of 165, max 5386, total 165527, using 769537/15947236 reads

I need the range to run another assembly program called Metassembler, it requires the max insert length and min insert length. My question is: How can I find out the insert length from Velvet?

Your help is greatly appreciated.

Assembly next-gen • 2.9k views

ADD COMMENT • link updated 3.8 years ago by bagdevi.mishra ▴ 110 • written 7.9 years ago by Kenny ▴ 30

score 1 · Answer 1 · 2017-09-28

1

Entering edit mode

7.9 years ago

Brian Bushnell 20k

You can map the reads to the assembly and get the insert size from that; you don't need to map all the reads. For example, with BBMap:

bbmap.sh in1=oenopla-reads1.fastq in2=oenopla-reads2.fastq ref=contigs.fasta reads=100k ihist=ihist.txt

...where ihist.txt will contain the insert size distribution.

ADD COMMENT • link 7.9 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks!

1) I ran a perl script from the velvet developer to find the insert size http://github.com/dzerbino/velvet/blob/master/contrib/observed-insert-length.pl/observed-insert-length.pl

perl observed-insert-length.pl Velvet_67 > Velvet_ins_size.txt

Outcome:

Observed **median insert length: 352**
Observed mode of insert length: 347
Observed sample standard deviation: 868.138129940986
Suggested velvetg parameters: -ins_length 352 -ins_length_sd 868.138129940986

2) I ran bbmap to actually map the short reads to the Velvet contigs to calculate the insert size.

bbmap.sh in1=oenopla-reads1.fastq in2=oenopla-reads2.fastq ref=Velvet_67/contigs.fa reads=-1 ihist=ihist.txt

Outcome:

Pairing data:   pct pairs num pairs pct bases   num bases
mated pairs:       9.1591%    730308   9.1591%    109546200
bad pairs:         5.9061%    470928   5.9061%     70639200
insert size avg:   862.75
insert 25th %:      91.00
**insert median:     193.00**
insert 75th %:     374.00
insert std dev:    3468.98
insert mode:       78

I got two different median insert size, 352 and 193. Which number should I pick? How to decide the max and min?

ADD REPLY • link 7.9 years ago by Kenny ▴ 30

1

Entering edit mode

In this case the vast majority of your reads did not map as proper pairs. This probably indicates your assembly was fairly discontiguous. Since only 9% of the reads mapped as proper pairs, you cannot trust that the insert size data from BBMap accurately reflects the library as a whole since it only covers 9% of the reads (the rest are likely to be longer, when the assembly has low contiguity). I suggest using Velvet's estimate.

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k

score 0 · Answer 2 · 2021-11-19

when I run velvetg without specifying insert length, the standard output captured in the nohup.out file contains the following lines. ......

[1564.566048] Computing read to node mapping array sizes

[1572.478116] Computing read to node mappings

[1593.435209] Estimating library insert lengths...

[1627.668440] Paired-end library 1 has length: 570, sample standard deviation: 43 [1634.073744] Paired-end library 2 has length: 244, sample standard deviation: 57 [1639.688635] Paired-end library 3 has length: 238, sample standard deviation: 54

.......