Question

Velvet Assembly Quality

4

Entering edit mode

14.1 years ago

Lee Katz ★ 3.2k

I would like to make a fake quality file for a Velvet de novo assembly. What is the average quality of a velvet assembly?

I've been told that Velvet assemblies are basically infallible. If they are, then for an assembly I might say that there is less than one error in the entire genome,

phred=-10*log(1/G)

Where G is the size of the genome in nucleotides. If the size of my bacterial genome is 2 MB, then the quality of the assembly would be greater than -10log(1/(210^6)), or 63.

So, does anyone have a good idea what a Velvet quality file would look like? Do you agree with my "63" assessment?

assembly velvet quality • 6.6k views

ADD COMMENT • link updated 13.8 years ago by Falcatrua ▴ 10 • written 14.1 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

It sounds like a strange claim to me that Velvet assemblies are "basically infallible". What is this statement based on?

ADD REPLY • link 14.1 years ago by Mikael Huss 4.8k

0

Entering edit mode

A colleague told me that he resequenced a genome using Illumina. He found 3 errors that even a Sanger whole genome sequence project missed, which he verified by looking at original chromatograms.

Also, in the Velvet website (http://www.ebi.ac.uk/~zerbino/velvet/):

Does Velvet take base caller confidence scores into account?

No it currently does not, although it would be easy to implement. The reason we have not done it yet is because a lot can be inferred from coverage alone.

ADD REPLY • link 14.1 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

I'm spawning a new question out of this instead. Velvet: Retain Read Names In Afg File

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 14.1 years ago by Lee Katz ★ 3.2k

score 3 · Answer 1 · 2010-10-21

3

Entering edit mode

14.1 years ago

Ketil 4.1k

Phred qualities are defined as:

E = 10^{-Q/10}

where E is the error rate (probailty of a correct call), and Q is phred quality. So, Q=10 means E=0.1, Q=20 gives E=0.01 (1% errors). Conversely

Q = -10*log_10(E)

Which is the formula you used, so your analysis and arithmetic looks good to me.

That aside, I'm not sure why you want to do this. The point of having quality associated with an assembly is to identify the variations, some areas are bound to have lower coverage or an accumulation of poor quality reads - which the quality score can help you to identify.

ADD COMMENT • link 14.1 years ago by Ketil 4.1k

1

Entering edit mode

My problem with Velvet is that it does not output quality scores. It might be a significant project to produce accurate cumulative quality scores in the final assembly. However, a fake quality file such as the one above would be all right. Or, if I could produce a quality file based on coverage it might be even better.

The biggest reason I want this quality file is so that I can merge its assembly more accurately with another assembly to produce a comprehensive final assembly.

ADD REPLY • link 14.1 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

So you would say to look at the coverage and create a quality score based on how many reads cover a base? Has this been done before with Velvet output?

ADD REPLY • link 14.1 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

I'm not familiar with Velvet, but most sequencing technologies produce reads with associated quality of some sort. The assembler should use this quality to derive a new quality for each position, so that if the bases of all (or most) reads agree, this should result in a higher quality base call in the assembly, than if more reads disagree.

ADD REPLY • link 14.1 years ago by Ketil 4.1k

Ram · Answer 2 · 2010-10-25

1

Entering edit mode

14.1 years ago

Lee Katz ★ 3.2k

It looks like the quality could actually be in the afg file of the Velvet output. Can anyone confirm? I had to use the latest version of BioPerl to get some of the methods.

#!/usr/bin/perl
use Bio::Assembly::IO;
use Data::Dumper;
system("amos2ace velvet_asm.afg")
$aceio=Bio::Assembly::IO->new(-file=>"velvet_asm.ace",-format=>"ace");
while(my $contig=$aceio->next_contig){
  if(ref($contig) eq 'Bio::Assembly::Contig'){
    print "contig: ";
    print Dumper $contig->get_consensus_quality;
    print "\n";
  }
  elsif(ref($contig) eq 'Bio::Assembly::Singlet'){
    print "singlet: ";
    print Dumper $contig->get_consensus_quality;
    print "\n";
  }
}

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 14.1 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

This script of course has to read the afg file that is outputted by velvetg using the "-amos_file yes" option.

ADD REPLY • link 14.1 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

I'm spawning a new question out of this instead. Velvet: Retain Read Names In Afg File

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 14.1 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

I have difficulties making sense out of this in conjunction with Falcatrua's answer. Are these real qualities, or not? Anyway, I just recalculate qualities by independently mapping reads back to the assembly.

ADD REPLY • link 13.7 years ago by Ketil 4.1k

score 1 · Answer 3 · 2010-10-27

1

Entering edit mode

14.1 years ago

Ketil 4.1k

Just noticed that DiGuistini et al. used Velvet to pre-assemble Illumina reads (as Forge apparently couldn't handle the size of the dataset), and they assigned a flat Phred 20 quality to the assemblies in further processing (midway down in the left column, second to last page of the paper)

ADD COMMENT • link 14.1 years ago by Ketil 4.1k

score 1 · Answer 4 · 2010-10-29

Basicaly, what Velvet does with quality values when converting .afg to .ace (amos2ace) for files without phred qualities is convert the sequence (bases) to ASCII characters.

You can check this by looking in a .ace file created by amos2ace for assembly files without phred values. Get a contig sequence in the ace and look at it`s quality values, same bases always have the same values.

Velvet does this thing because when there are no quality values in your reads submitted to assembly, the quality values in the .afg file is the sequence itself (.seq and .qlt are redundant) You can also check amos2ace code (in perl), and you will see that it gets this .qlt values (same thing as the sequence) and convert the sequence bases to ascII characters using ord (), that is a perl function to do that conversion.