Hi friends,
We have WGS data for a bacterial sample and the read length is 75bp (paired-end) with more than 200X coverage. Trimmed reads ranging between 20-75bp.
I am going to try denovo assembly using SPAdes3.13.1 and SOAP-denovo assemblers.
What criteria should be used to select the Kmers for assembly?
Hi h.mon, I am currently using SPAdes with default settings and auto is the default one for kmer. I have more than 430 million reads for a bacterial sample. Do you know any tool subset/downsample them to lesser coverage?
Don't subset, use digital normalization, which is a better technique to reduce coverage without loosing information. There are several packages which perform digital normalization, I use BBNorm (from BBTools package) when I need to.
If you really want to down-sample, you can use
reformat.sh
(from the same BBTools package). For example, to down-sample to 10% of the original reads:Hi h.mon, I am aiming to reduce the coverage from 10000X to 1000X. so in my case, I need to do digital normalization using BBNorm rather than reformat.sh(downsample).
bbnorm.sh in=reads.fq out=normalized.fq target=1000 min=30
What "min" is reasonable to get 1000X coverage?
I have asked a different set of questions in the post (Should I consider contigs.fa or scaffolds.fa from SPAdes output for downstream analyses?) that are related to this post
Hi h.mon,
using BBNorm, can we downsample to a specific read coverage? I saw the target option in BBNorm is about the kmer coverage. How much should I keep for the target option in order to get 100X read coverage?
As far as I can think of, one can't down-sample straight to a target read coverage without an assembled genome, so you have to content yourself with kmer coverage. Use
target=100
and, after assembly, map the reads and check if you got the expected coverage, then adjusttarget
as needed - but I expect it would be close enough. As reads may contain errors, I expecttarget=100
will end up with slight higher read coverage.However, why do you want to do this? de Bruijin assemblers measure coverage in kmers, not reads.