Velvet: What's the relation between kmer coverage and normal coverage?
1
1
Entering edit mode
8.9 years ago
novice ★ 1.1k

I'm trying to convert the kmer coverage reported in the headers of my contigs into standard coverage. Velvet's manual says the relation between kmer coverage Ck and standard coverage C is Ck = C * (L - K + 1) / L where L is the read length and k is the chosen kmer length.

However, I tried using this formula to calculate C given Ck for each contig, then calculated the median C, i.e. standard coverage, for all the assembled contigs using my average read length, 240, and my chosen kmer parameter, 69. The result I got, 66, was different than the one reported by velvet in the Log file, 23. Do you know why this might be?

coverage velvet assembly contigs • 5.2k views
ADD COMMENT
0
Entering edit mode

It's not normal coverage, it's nucleotide coverage (C). You need to rearrange the formula to find C based on all the other info.

ADD REPLY
0
Entering edit mode

That's what I did. The problem is that the median C I found is different than the C reported by velvet in the Log file as "Median coverage depth."

ADD REPLY
0
Entering edit mode
8.9 years ago

I am really confused about what you have done.

You need to calculate coverage C taking into account the number of total reads, their length L and the genome size. Not using contigs..

Then, you figure out Ck by using the formula

And you need to calculate that before doing the assembly with velvetg, since it is a parameter required by the program

ADD COMMENT
0
Entering edit mode

Hi Antonio, I did not mean to confuse you. I'll try to explain again:

Velvet reports the coverage in two files: the Log file (Median Coverage Depth) and the contigs.fa file (in each contigs header, preceded by _cov_). Assuming both of these are kmer coverages, I supposed the median of the coverages in the contigs.fa file should be equal to the median coverage in the Log file, but it wasn't.

I then supposed that the median coverage in the Log file could be in terms of nucleotides, so I converted the coverages in the contigs.fa file into nucleotide coverages (by multiplying by (L / (L - k + 1))) and found their median. This median was again different than that reported in the Log file.

This made me confused, as you are, as to what the coverages reported in contigs.fa and the median coverage reported in the Log file actually mean, so I asked the wise online bioinformatics community for enlightenment.

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6