Question

Normalization of raw Illumina reads.

0

Entering edit mode

4.8 years ago

robert.murphy ▴ 110

I am looking for an optimal method for de novo assembly of fungal isolates and came across this nice nature paper.

They use a combination of methods and in two of them (workflow 2 and 3) they mention normalization of the data:

To overcome MDA-generated differences in coverage across the genome, the second workflow normalized raw reads to average 100X before assembling using SPADES

and again:

A third assembly was created using SPADES40 after combining raw reads from 24 nuclei followed by normalization to 100X

I am struggling to understand what they mean by this. Could anyone help explain to me what they are doing here?

sequencing Assembly • 1.5k views

ADD COMMENT • link updated 4.8 years ago by Ram 45k • written 4.8 years ago by robert.murphy ▴ 110

0

Entering edit mode

Check the guide for bbnorm.sh which is the tool used for read normalization.

ADD REPLY • link 4.8 years ago by GenoMax 152k

score 1 · Answer 1 · 2020-10-09

In the methods section, they explain how they use BBMap to normalize.

Assembly workflow 2: Each set of reads was normalized using bbnorm of BBMap52 v. 38.08 with a target average depth of 100×. Normalized data were assembled individually into 24 assemblies using SPADES40, and a consensus assembly was generated with Lingon38, with the same sequence motifs as for assembly 1.

Normalizing reads before de novo assembly is a common strategy to reduce computational complexity where the qualitative nature of data is more important/relevant than the quantitative information in it.