Hello everyone,
I am currently facing an unusual situation while performing adapter trimming using both Trim Galore and Trimmomatic. I would appreciate any insights or suggestions.
Here's a brief overview of my process and the issue I encountered:
Commands Used:
- Trim Galore:
trim_galore -j $cores --paired "$line1" "$line2" -o Trimmed_1 --length 36 --gzip
- Trimmomatic:
TrimmomaticPE -threads 16 "$line1" "$line2" -baseout "MB$counter.fq.gz" ILLUMINACLIP:TruSeq3-PE.fa:2:30:10
Problem Description:
- Trim Galore Results: When validating QC and running MultiQC, Trim Galore shows a variation in length distribution, which is what I expected.
- Trimmomatic Results: Surprisingly, Trimmomatic shows no variation in length distribution, presenting a perfect 150bp across all sequences.
Despite this discrepancy in sequence length, the adapter content is reduced in both Trim Galore and Trimmomatic, indicating successful adapter removal.
My Concerns:
- Why is there a discrepancy in length distribution between the two tools?
- How can Trimmomatic reduce adapter content without altering the length distribution?
Has anyone encountered a similar situation or can provide an explanation for this behavior? Any advice on troubleshooting or understanding these results would be greatly appreciated. My thoughts are that all the reads with adapters were categorized as unpaired and thus won't contribute to length variance, since the multiQC analysis is performed only on the paired files. However, given that I have specified the --length
parameter in Trim Galore to be the same as the default minimum length setting in Trimmomatic, shouldn't the outcomes be identical?
Thank you!
If starting reads were 150 bp then that appears to indicate that no trimming occurred. Are you using the right adapter sequences file?
What about the total read numbers that survive trimming in both cases. It sounds like
trommomatic
must be removing entire reads that it trimmed, if it is trimming the data.I checked the adapters.
Here are some screenshots from the multiqc analysis before and after trimming with trimmomatic.
Also in terms of reads: Sample 1: unfiltered: 56268121 | trimmomatic: 52060220 | Trimgalore: 56232356
I get that trimmomatic probably marks reads with adapters as unpaired but do you know why it does that and why Trimgalore doesn't?
Are you referring to reads that end up in "unpaired" files where one of the mates is completely eliminated?
Is the top plot after or before trimming? If the bottom plot is "after" it would not make sense since the adapter content is up.
If you are willing to try a couple of other options, I suggest using
fastp
orbbduk.sh
(guide here: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/ )Yeah sorry, the labels are wrong.
Yes, exactly. In contrast, Trimgalore which also validated pairs doesn't eliminate them.
Sure, I will give them a try to check whether the results are closer to Trimgalore or Trimmomatic. If you are interested as well I will keep you updated.
Thank you
Not good if it is leaving a singlet read behind. That would put R1/R2 files out of sync.
I also performed trimming with fastp, which pdouced similar results with TrimGalore. My guess is that my trimmoamtic run is wrong