I am trying to clean RNAseq reads with TRIMMOMATIC using the parameters as:
java -jar trimmomatic-0.36.jar PE -trimlog file_trim_log input_1.fastq.gz input_2.fastq.gz output_1P_clean.fq.gz output_1U_clean.fq.gz output_2P_clean.fq.gz output_2U_clean.fq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 SLIDINGWINDOW:10:30 LEADING:28 TRAILING:28 MINLEN:75
ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 : My reads contains TruSeq Universal adapters as indicated by FASTQC.
SLIDINGWINDOW:10:30 : As per manual
LEADING:28 : Willing to remove bases whose quality falls below 28 in FASTQC per base quality module
TRAILING:28 : Willing to remove bases whose quality falls below 28 in FASTQC per base quality module
MINLEN:75 : Minimum length should not fall below 75bp
Are the parameters reasonable or too strict?
For Leading and Trailing the quality score in several manuals were given 3. Isn't that too low?
I am a bit concerned because only 70-80% of the paired reads survived. The rest either fall of Forward/Reverse Only Surviving or Dropped.
in the RNAseq data analysis, You have to be careful to strike a balance between acceptable quality and also minimize the number of discarded reads. it should be noted, all the adapters contamination should be trim. I recommend you 123Fastq which combine FASTQC and trimmomatic in a highly interactive graphical user interface. it also added some improvements to QC modules of FASTQC, added a Kmer-based approach to remove adapters in the trimming, and many other features. try it your own: https://sourceforge.net/projects/project-123ngs/