Lower Assembly Quality Despite Higher Coverage?
1
1
Entering edit mode
5.9 years ago
AXR ▴ 10

Hi everyone,

I wanted to see the effect of coverage on the assembly quality to see at which point there is a diminishing return. I am using paired-end reads only (101x2 with 520 insert size), no mate pairs or long reads. Normally a higher coverage is supposed to increase NGA50, but instead, the contig NGA50 has gone down while the LGA50 has gone up. I have five levels of coverage: 10X, 15X, 20X, 25X and 30X.

15X has the highest contig NGA50 and lowest LGA50, while 30X has the lowest NGA50 and highest LGA50. The order of high NGA50 to low NGA50 is in this order: (Best)15X, 10X, 20X, 25X, 30X(Worst).

The k-mer size used for assembly was 25 bp and was run with SOAPDenovo2 and low-frequency k-mers were not discarded.

I used QUAST to evaluate the assembly.

What explanation(s) could there be for these results?

Thank you.

Assembly next-gen Coverage alignment genome • 1.2k views
ADD COMMENT
0
Entering edit mode
5.9 years ago
h.mon 35k

low-frequency k-mers were not discarded.

Low frequency kmers most of the time are errors. If you increase coverage, you increase the amount of errors, and this may be the cause of the worst NGA50 with increased coverage. Did you perform error correction before assembly? You may try error correcting the reads, or removing bad kmers.

Even so, there is no simple answer to your question. Taking only one two measures (NGA50 and LGA50) as overall measure of assembly quality is not advisable. Your coverage range is very narrow, many assemblers need 50x-100x coverage. Also, the interplay between assembly quality and coverage also depends on assembler, genome complexity, and other factors.

ADD COMMENT

Login before adding your answer.

Traffic: 1520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6