Question

Normalization after Trimmomatic ?

0

Entering edit mode

2.8 years ago

valentin.cledassou • 0

Hello everyone,

I try to create a pipeline in bash script for whole genome sequencing (in order to obtain the whole/core genome MLST). It is not metagenomic.

But I am a little confused about steps of the data trimming... I have seen a lot of contradictions

Originally my workflow was : FastQC > Confindr > Kraken2 > Trimmomatic (trim adapters + quality) > FastQC > SPAdes. But I have sometimes seen (principally in Biostars) some people using the normalization of reads (not always) with bbnorm or trinity for example.

So, I firstly would like know if the normalization of reads should be constantly applied ? In my paired end files I have always the same number of reads between R1 and R2. The same after Trimmomatic when i keep the paired files R1 et R2 after trimming.

If yes, it's before or after trimmomatic step (adapters removed + quality trimming) ?

And finally if it's necessary, bbnorm can be used in a bash script ?

Thank you for your future answers,

Valentin

bbnorm trimmomatic • 1.1k views

ADD COMMENT • link updated 2.8 years ago by GenoMax 150k • written 2.8 years ago by valentin.cledassou • 0

0

Entering edit mode

Why do you want to use bbornm ? In SPAdes, there is an error-correction parameter, you can normalized the reads you have using that parameter. If you talk about unequality of paired-end reads, you can use makepair to equalize them. Also, Trinity is de novo assembly tool for RNA-seq data. I do not think It is used for normalization of WGS reads.

ADD REPLY • link 2.8 years ago by young_bioinformatician ▴ 240

0

Entering edit mode

Thank you for your reply ! For SPAdes I just use the --careful option (for missmatch) with the input of R1 and R2 reads files. For the error correction you mean --isolate option ? In SPAdes tutorial the author said that's not compatible with --careful option ? So I don't know if it's better to use this to replace the --careful option

ADD REPLY • link 2.8 years ago by valentin.cledassou • 0

0

Entering edit mode

Have you seen bbnorm guide: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbnorm-guide/

If you are sequencing entire genome then you are not doing MLST.

ADD REPLY • link 2.8 years ago by GenoMax 150k