Question

Trimming fastq with quality

3

Entering edit mode

10.3 years ago

RafaelMP ▴ 120

Hello everyone!

I'm a newbie in analyzes with RNA-Seq.
I have paired-end reads data (Sanger / Illumina 1.9) and would like to cut the reads using quality score.
We have visualized data with FastQC and we tested the softwares: sickle, Spade, fastq quality filter (fastx) and AllPaths.
The best result was obtained with the fastq quality filter, but we lost a lot of reads in the process (it does not cut off part of the reads, only excludes the entire read).

I wonder if there is already a program that recognizes the first occurrence of a symbol (eg '#') and cuts the sequence and the quality from that point.

RNA-Seq fastx fastq trim FastQC • 21k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by RafaelMP ▴ 120

1

Entering edit mode

Did you use fastq-Trimmer instead? I think you can also try running Prinseq using -trim_qual_left, -trim_qual_right, with given quality threshold. In your case, you mentioned #. Quickly looking at the Encoding chart in fastq wiki, #'s score is 2 (assuming 1.8+ and 1.9+ are same). So try using threshold 2, which should remove the regions before and after the threshold value.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Prakki Rama ★ 2.7k

1

Entering edit mode

10.3 years ago

Ming Tommy Tang ★ 4.5k

If you are doing de novo transcriptome assembly, trim the fastq files at phred score 5, not 20

http://angus.readthedocs.org/en/2014/_static/ngs2014-trimming.pdf

ADD COMMENT • link 10.3 years ago by Ming Tommy Tang ★ 4.5k

0

Entering edit mode

10.3 years ago

Jeff Stafford ▴ 50

seqtk is the fastest algorithm available and easy to use/install (can process huge FASTQs in seconds to minutes). It will trim reads from both ends based on phred score and leave the good parts intact. You can set whatever phred quality threshold you want with the "-q" option.

You can get it here https://github.com/lh3/seqtk/. Just 'make' it and you're good to go.

ADD COMMENT • link 10.3 years ago by Jeff Stafford ▴ 50

0

Entering edit mode

How to trim reads less than phred score 20 using seqtk? What does the default of 0.05 mean in terms of phred score? (-q FLOAT error rate threshold (disabled by -b/-e) [0.05])

ADD REPLY • link 6.9 years ago by deepti1rao ▴ 50

0

Entering edit mode

5.7 years ago

i.sudbery 20k

We use trimmomatic for this it allows you to trim reads below a certain quality from both the 3' and 5' end, and also trim using the average quality within a window.

ADD COMMENT • link 5.7 years ago by i.sudbery 20k

0

Entering edit mode

5.7 years ago

GenoMax 147k

Since an old thread got activated. I will add that bbduk.sh from BBMap suite can also be used to do quality based trimming in addition to a host of other things. A guide is available here.

qtrim=f             Trim read ends to remove bases with quality below trimq.
                    Performed AFTER looking for kmers.  Values: 
                       rl (trim both ends), 
                       f (neither end), 
                       r (right end only), 
                       l (left end only),
                       w (sliding window).
trimq=6             Regions with average quality BELOW this will be trimmed,
                    if qtrim is set to something other than f.  Can be a 
                    floating-point number like 7.3.
minavgquality=0     (maq) Reads with average quality (after trimming) below 
                    this will be discarded.
maqb=0              If positive, calculate maq from this many initial bases.
minbasequality=0    (mbq) Reads with any base below this quality (after 
                    trimming) will be discarded.

ADD COMMENT • link 5.7 years ago by GenoMax 147k

Ram · Accepted Answer · 2014-08-13

When a major part of the read has low quality bases then trimming reduces the length of the read and now the read can't be aligned with higher confidence against the reference genome. So almost all the trimming software will discard reads whose length has been reduced to less than some number (say n=30). If most of your reads suffer from this problem than all the trimming tools will behave in the same way i.e. discarding majority of the reads. Although this is not a solution but I would suggest you trying Trimmomatic and see if it helps.