Trinity - in silico normalization before/after trimmomatic?
2
0
Entering edit mode
7.9 years ago
wasphunter • 0

I have 100+ million paired-end reads from which adapter sequences have been removed but quality trimming has not yet been conducted. It seems most efficient from a memory utilization standpoint to conduct in silico normalization and trimmomatic steps FIRST and only once and then to use normalized/trimmed data for any subsequent trinity assembly trials. If so, which should come first, the normalization operation or the trimmomatic operation? Thanks!

rna-seq • 3.0k views
ADD COMMENT
0
Entering edit mode
7.9 years ago

The optimal order is:

1) Adapter-trimming

2) Quality-trimming (if necessary)

3) Normalization (if necessary)

From a memory perspective, if you use BBNorm for normalization, the order does not matter since its memory use is unrelated to input data. However, the quality of normalization is superior for trimmed data because there are fewer erroneous kmers.

If you download the BBMap package, there is a file at /bbmap/docs/guides/PreprocessingGuide.txt which outlines my recommendations for the order of preprocessing steps.

ADD COMMENT
0
Entering edit mode

Thanks for the advice and links, Brian. I've been considering BBNorm for all the reasons you mention in support of the software above and elsewhere.

ADD REPLY
0
Entering edit mode
7.9 years ago
Farbod ★ 3.4k

Dear wasphunter, Hi

You can launch normalization and Trimmomatic at the same time in just one script in Trinity.

After quality trimming of reads using Trimmomatic remember to use trimmed files for the next steps NOT the original read files.

By the way, in silico normalization now happens by default in the most recent version of Trinity (v2.3.2).

~ Best

ADD COMMENT
0
Entering edit mode

Thanks Farbod. I was aware of the capacity to package all steps in a single Trinity run. That is attractive but, like many, I have limited access to computing resources. It seemed to me prudent to create a high-quality, normalized data set one time that I could then use for any downstream assemblies that I wanted to undertake.

ADD REPLY

Login before adding your answer.

Traffic: 2334 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6