Question

Trimmomatic usage issue

0

Entering edit mode

6.9 years ago

Gene_MMP8 ▴ 240

I have two fastq read files(Read_1.fq, Read_2.fq). I did a quality check and found out that both the reads are of very poor quality. Rest all the FASTQC outputs are fine. Hence I decided to use Trimmomatic. There are some parts of the tools that I don't understand.
If I want to retain only bases with minimum quality 28, what command should I use? I don't understand what LEADING/TRAILING means here. After trimming, will I get reads of the same length? Can someone please help?

next-gen sequencing • 2.1k views

ADD COMMENT • link updated 6.9 years ago by chen ★ 2.5k • written 6.9 years ago by Gene_MMP8 ▴ 240

1

Entering edit mode

SLIDINGWINDOW:<windowSize>:<requiredQuality> you could do SLIDINGWINDOW:5:28 (something like this)

If no bases were trimmed from any reads then you would still have reads of the original length. Otherwise you will have reads of variable length depending on how many bases are eliminated from each read.

LEADING/TRAILING means cut bases off at the beginning and end of read respectively if they are below certain value.

All that said I suggest that you use bbduk.sh from BBMap suite instead. Easy to use and understand options. Help documentation here.

ADD REPLY • link 6.9 years ago by GenoMax 149k

0

Entering edit mode

What do you intend to do downstream? You quality cut-off is really stringent, and depending on the analysis you want to perform, you may throw a lot of good data. Also, it is a good idea to post here the picture of the FastQC quality check.

ADD REPLY • link 6.9 years ago by h.mon 35k

0

Entering edit mode

Thanks for your reply. I intend to find variants using MUTECT2 and apply driver detection algorithms downstream. One thing that baffles me is that how can you consider poor quality bases as "good data"? Am I really losing out information by throwing out poor quality bases?
This is the FASTQC report image

ADD REPLY • link updated 6.9 years ago by GenoMax 149k • written 6.9 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

You do have a significant quantity of poor quality data (your success in calling variants may be limited by this). But Q28 seems to be a stringent cut-off as @h.mon stated. When you have a reference genome available Q15 or Q20 may be stringent enough cut-off.

ADD REPLY • link 6.9 years ago by GenoMax 149k

score 0 · Answer 1 · 2018-04-07

0

Entering edit mode

6.9 years ago

chen ★ 2.5k

You goal can be achieved by using sliding window pruning. You may try fastp with following command: fastp -i Read_1.fq, -I Read_2.fq -o Read_1.out.fq -O Read_2.out.fq -5 -3 -M 28