Total Length Of Assembled Scaffolds Is Greater Than Genome Length
2
0
Entering edit mode
11.4 years ago
AW ▴ 350

Hi,

I would greatly appreciate some help with my problem.

I have just assembled denovo a genome from Illumina 100bp paired end reads, using SOAPdenovo2 and then GapCloser.

My total scaffold length is 1,062,995,336 base pairs (from 207528 scaffolds) and my haploid genome is approximately 1.2 Gb. From this I calculate a percentage coverage of 104%?

Have I calculated coverage incorrectly, or should I have filtered short scaffolds? I am unsure why the coverage is greater than 100%?

Thanks very much for any help

Alison

coverage denovo genome assembly • 3.8k views
ADD COMMENT
2
Entering edit mode

How did you calculate 104%? from what you've said, your assembly is 1.06 Gb in size, and you are expecting 1.2 Gb so wouldn't your coverage be 88% (1.06/1.2)?

ADD REPLY
0
Entering edit mode
11.4 years ago
Gabriel R. ★ 2.9k

What I would do would be to align the raw reads back to your scaffolds then genotype to compute your coverage.

ADD COMMENT
0
Entering edit mode
11.4 years ago
ugly.betty77 ★ 1.1k

For one assembly I have been doing currently, I experienced similar problem with SGA. Jared Simpson recommended me to remove anything smaller than 2x read length to avoid polymorphic or repetitive being over-counted. After I removed those short scaffolds, the total size of assembly came to be close to what I got from other assemblers.

ADD COMMENT

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6