Question

how to use bbnorm to subset the data

0

Entering edit mode

8.5 years ago

wu.zhiqiang.1020 ▴ 50

I have a pooled RNA seq data from 5 to 10 individuals at different stages. I want to assemble them and combine all of data as one big fastq files. I use the bbnorm to reduce the replicates. But I have a question about how to set the parameters as:

target=100          (tgt) Target normalization depth.  NOTE:  All depth parameters control kmer depth, not read depth. For kmer depth Dk, read depth Dr, read length R, and kmer size K:  Dr=Dk*(R/(R-K+1))

maxdepth=-1         (max) Reads will not be downsampled when below this depth, even if they are above the target depth.

mindepth=5          (min) Kmers with depth below this number will not be included when calculating the depth of a read.

minkmers=15         (mgkpr) Reads must have at least this many kmers over min depth to be retained.  Aka 'mingoodkmersperread

Because I have the pooled data, how can I set the mindepth and minkmers for the following analysis? I need some suggestions on how to reduce the reads with error and also keep the isoform.

thanks

rna-seq Assembly bbmap bbnorm • 3.2k views

ADD COMMENT • link updated 8.5 years ago by GenoMax 152k • written 8.5 years ago by wu.zhiqiang.1020 ▴ 50

0

Entering edit mode

What exactly are you trying to do? It's very unusual to try to co-assemble multiple individuals, because that increases the heterogeneity (more SNPs and so forth make assembly harder), and it's very unusual to normalize reads when calculating expression.

That said, for normalization, normally I leave everything at default except for target and mindepth. The target depends on the assembler but 100 is usually good. Mindepth can be left at default usually, as well, which reduces error-containing reads from high-depth regions. But if you want to assemble features from very low-depth areas (hopefully not many of those once you mix many different samples) then you can reduce it to perhaps 2.

ADD REPLY • link 8.5 years ago by Brian Bushnell 20k