Question

bbcms vs bbnorm when working with highly uneven coverage

2

Entering edit mode

3.7 years ago

samlambrechts299 ▴ 170

Hi, when working with highly uneven metagenomic datasets (e.g. soil), where coverage is extremely high for a few dominant organisms (which creates problems during assembly due to sequencing errors adding erroneous edges to the graph), and relatively low for the others, do you use bbnorm to normalize read depth or bbcms for depth filtering? Or both?

BBTools Metagenomics Megahit Assembly BBNorm • 2.0k views

ADD COMMENT • link updated 2.9 years ago by lintonf • 0 • written 3.7 years ago by samlambrechts299 ▴ 170

score 1 · Answer 1 · 2021-04-01

1

Entering edit mode

3.7 years ago

GenoMax 147k

This is a question that would benefit if Brian was around to answer. He says the following in BBNorm guide

Normalizes read depth based on kmer counts. Can also error-correct, bin reads by kmer depth, and generate a kmer depth histogram. Normalization is often useful if you have too much data (for example, 600x average coverage when you only want 100x) or uneven coverage (amplified single-cell, RNA-seq, viruses, metagenomes, etc).

I would say use bbnorm.sh before bbcms.sh because latter is filtering based on depth.

Error corrects reads and/or filters by depth, storing kmer counts in a count-min sketch (a Bloom filter variant).

ADD COMMENT • link 3.7 years ago by GenoMax 147k

0

Entering edit mode

I have a follow-up question to this. I am assembling contigs from my metagenome reads and then annotating the assembled contigs to determine the functional pool of my sample rather than going into genome binning. I normalized my reads before error correction and assembly, which i know is the right choice for building MAGs...but if I am just interested in the functional pool, should I wait to noramlize until after annotation, when I want to determine the depth of coverage for my functions of interest? Thanks!

ADD REPLY • link 2.9 years ago by lintonf • 0