Velvet - ins_length auto
2
0
Entering edit mode
7.2 years ago
Kenny ▴ 30

Hi all,

I have obtained two illumina MiSeq 2x75 paired-end read files, one forward and one reverse.

oenopla-reads1.fastq & oenopla-reads2.fastq

Then I performed genome assembly using Velvet. I found out kmer of 67 produces the best N50 and maximum length.

Since I do not know the insert length, I declared -ins_length as auto. Here's my command:

velveth Velvet_67 67 -shortPaired -fastq -separate oenopla-reads1.fastq oenopla-reads2.fastq
velvetg Velvet_67 -ins_length auto -exp_cov auto -cov_cutoff auto

I wanted to see the ins_length (max and min), however when I check the log file, I only saw this:

Median coverage depth = 15.632530
Final graph has 3045 nodes and n50 of 165, max 5386, total 165527, using 769537/15947236 reads

I need the range to run another assembly program called Metassembler, it requires the max insert length and min insert length. My question is: How can I find out the insert length from Velvet?

Your help is greatly appreciated.

Assembly next-gen • 2.5k views
ADD COMMENT
1
Entering edit mode
7.2 years ago

You can map the reads to the assembly and get the insert size from that; you don't need to map all the reads. For example, with BBMap:

bbmap.sh in1=oenopla-reads1.fastq in2=oenopla-reads2.fastq ref=contigs.fasta reads=100k ihist=ihist.txt

...where ihist.txt will contain the insert size distribution.

ADD COMMENT
0
Entering edit mode

Thanks!

1) I ran a perl script from the velvet developer to find the insert size http://github.com/dzerbino/velvet/blob/master/contrib/observed-insert-length.pl/observed-insert-length.pl

perl observed-insert-length.pl Velvet_67 > Velvet_ins_size.txt

Outcome:

Observed **median insert length: 352**
Observed mode of insert length: 347
Observed sample standard deviation: 868.138129940986
Suggested velvetg parameters: -ins_length 352 -ins_length_sd 868.138129940986

2) I ran bbmap to actually map the short reads to the Velvet contigs to calculate the insert size.

bbmap.sh in1=oenopla-reads1.fastq in2=oenopla-reads2.fastq ref=Velvet_67/contigs.fa reads=-1 ihist=ihist.txt

Outcome:

Pairing data:   pct pairs num pairs pct bases   num bases
mated pairs:       9.1591%    730308   9.1591%    109546200
bad pairs:         5.9061%    470928   5.9061%     70639200
insert size avg:   862.75
insert 25th %:      91.00
**insert median:     193.00**
insert 75th %:     374.00
insert std dev:    3468.98
insert mode:       78

I got two different median insert size, 352 and 193. Which number should I pick? How to decide the max and min?

ADD REPLY
1
Entering edit mode

In this case the vast majority of your reads did not map as proper pairs. This probably indicates your assembly was fairly discontiguous. Since only 9% of the reads mapped as proper pairs, you cannot trust that the insert size data from BBMap accurately reflects the library as a whole since it only covers 9% of the reads (the rest are likely to be longer, when the assembly has low contiguity). I suggest using Velvet's estimate.

ADD REPLY
0
Entering edit mode
3.0 years ago

when I run velvetg without specifying insert length, the standard output captured in the nohup.out file contains the following lines. ......

[1564.566048] Computing read to node mapping array sizes

[1572.478116] Computing read to node mappings

[1593.435209] Estimating library insert lengths...

[1627.668440] Paired-end library 1 has length: 570, sample standard deviation: 43 [1634.073744] Paired-end library 2 has length: 244, sample standard deviation: 57 [1639.688635] Paired-end library 3 has length: 238, sample standard deviation: 54

.......

ADD COMMENT

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6