Question

Suggested Trimmomatic inputs for LEADING and TAILING

0

Entering edit mode

3.8 years ago

dadrasarmin ▴ 20

Hi, I decided to use Trimmomatic for trimming raw reads. I saw many researchers used the inputs (LEADING:3 TRAILING:3) authors suggested on their webpage (https://www.usadellab.org/cms/?page=trimmomatic) for trimming RNA-Seq reads.

I thought these values are very low, so I started to find out their exact meaning. A website (http://drive5.com/usearch/manual/quality_score.html) used the below figure to explain the Phred (quality score), and stated that: "Note that a Q score of 3 means P (the error probability)=0.5, meaning that there is a 50% chance the base is wrong, and lower values represent even higher probabilities of error." Can anyone explain why you suggested such low criteria for filtering reads? Best, Armin

Quality score and the probability of error

RNA-Seq sequencing quality control • 1.2k views

ADD COMMENT • link 3.8 years ago by dadrasarmin ▴ 20

0

Entering edit mode

First of all do you have a specific problem with beginning or ends of reads as far as quality? If you don't then you don't have to strictly trim them. Poor quality bases (if they are adapter/contaminants) can be soft-clipped by aligners so don't strictly require trimming. That said, if you are planning to do any de novo assembly work then trimming the data more stringently at Q15 or higher may be warranted.

ADD REPLY • link 3.8 years ago by GenoMax 147k

0

Entering edit mode

Hi, Thank you for your reply. That is another problem I still have. Some scientists suggested removing the first few bases according to the "Per base sequence quality" and "Per base sequence content" sections of FastQC, while others suggested leaving these bases. I uploaded the results for one of my reads ( and ). I want to use Kallisto, which does a pseudo alignment to the reference transcriptome. I decided to trim the first 12 bases. However, in this post, I asked another question. I want to know why the authors of Trimmomatic suggested such low criteria on their website. Unfortunately, I was not able to contact the authors.

ADD REPLY • link 3.8 years ago by dadrasarmin ▴ 20

1

Entering edit mode

Please read these blog posts from authors of FastQC which should address your questions.

https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/
https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/

ADD REPLY • link 3.8 years ago by GenoMax 147k