I have two fastq read files(Read_1.fq, Read_2.fq). I did a quality check and found out that both the reads are of very poor quality. Rest all the FASTQC outputs are fine. Hence I decided to use Trimmomatic. There are some parts of the tools that I don't understand.
If I want to retain only bases with minimum quality 28, what command should I use? I don't understand what LEADING/TRAILING means here. After trimming, will I get reads of the same length? Can someone please help?
SLIDINGWINDOW:<windowSize>:<requiredQuality>
you could do SLIDINGWINDOW:5:28 (something like this)If no bases were trimmed from any reads then you would still have reads of the original length. Otherwise you will have reads of variable length depending on how many bases are eliminated from each read.
LEADING/TRAILING means cut bases off at the beginning and end of read respectively if they are below certain value.
All that said I suggest that you use
bbduk.sh
from BBMap suite instead. Easy to use and understand options. Help documentation here.What do you intend to do downstream? You quality cut-off is really stringent, and depending on the analysis you want to perform, you may throw a lot of good data. Also, it is a good idea to post here the picture of the FastQC quality check.
Thanks for your reply. I intend to find variants using MUTECT2 and apply driver detection algorithms downstream. One thing that baffles me is that how can you consider poor quality bases as "good data"? Am I really losing out information by throwing out poor quality bases?
This is the FASTQC report image
You do have a significant quantity of poor quality data (your success in calling variants may be limited by this). But Q28 seems to be a stringent cut-off as @h.mon stated. When you have a reference genome available Q15 or Q20 may be stringent enough cut-off.