How does Trimmomatic work?
1
1
Entering edit mode
8.5 years ago
EpiExplorer ▴ 90

Hi,

I am trying to trim the data with trimommatic. Does anyone know how does it work. The manual is not very clear to me. Also I am not clear what adapter file to use, How to set simple mode rather than palindromic mode and what to do with the unpaired files.

I tried to run this command. But I am not sure of what the tool is doing:

java -jar /home/devgen/bin/trimmomatic-0.36.jar PE -phred33  C95VLANXX-2046D-01-01-01_L003_R1.fastq C95VLANXX-2046D-01-01-01_L003_R2.fastq  N6_F_P.fastq N6_F_U.fastq  N6_R_P.fastq N6_R_U.fastq  ILLUMINACLIP:adapters/TruSeq3-PE.fa:2:30:10  MINLEN:25 AVGQUAL:20 -trimlog logs

Thanks for your help.

RNA-Seq • 7.9k views
ADD COMMENT
1
Entering edit mode

But I am still not sure of discrding 23% of reads. I tried FAQCs with same parameters and I get 99% of reads and paired and about 1% as unpaired.

Can anyone comment here regarding trimmomatic's output. I am getting about 70-76% reads ad paired and 24-30% as unpaired. Discarding about 30% reads is a huge risk. Any advise on this will be greatly appreciated.

thanks

ADD REPLY
1
Entering edit mode

please don't discuss prior answers by adding new answers. Use the comments (like this one) to discuss specific answers.

ADD REPLY
1
Entering edit mode

Will take care of this in future.

ADD REPLY
0
Entering edit mode

Thanks Phillipp. I will be using the RNASeq data for differential expression and Differential splicing analysis. The reason I was asking if I should keep unpaired files or not is because I get about 23% of reads in the unpaired files.

ADD REPLY
2
Entering edit mode

Personally, I'd discard the single end RNASeq reads when using an alignment-based method like TopHat/HISAT since using single reads in the alignments can lead to imprecise alignments, but I'd keep them if using a k-mer based method pseudo-alignment approach like Sailfish/kallisto since there the alignments don't matter (AFAIK - I better check the manual to make sure my thinking is correct).

ADD REPLY
1
Entering edit mode

But I am still not sure of discrding 23% of reads. I tried FAQCs with same parameters and I get 99% of reads and paired and about 1% as unpaired.

Can anyone comment here regarding trimmomatic's output. I am getting about 70-76% reads ad paired and 24-30% as unpaired. Discarding about 30% reads is a huge risk. Any advise on this will be greatly appreciated.

thanks

ADD REPLY
0
Entering edit mode

First let's get one thing straight. The real risk is including bad data with the good one. Tossing data out is not a risk- it is an inconvenience.

You see even after discarding 30% of that data most results will probably still hold. But just about any analysis would be invalid you added 30% biased and bad data to the good one. So I would look at it from that perspective.

The second important thing here is not to look at it from the perspective of: OMG the bad Trimmomatic is removing so much of my data!

Investigate and understand what happens. Why are these data being removed? Does it make sense, is it working correctly, do you fully understand what happens and which data is removed and why? It is very much possible to run Trimmomatic incorrectly, it is an obtuse and wonky tool and among the most user unfriendly ones out there IMO. So it is easy to misuse it.

If it is working correctly then ask yourself why would you even want data to keep data that is mostly made up of adapter contamination or extremely low quality. These are two reasons that Trimmomatic will be removing data for.

ADD REPLY
0
Entering edit mode

Also it is not clear what you mean by FAQC reporting 99% reads paired. What is FAQC? Do you mean FASTQC? But that tools does not do quality control at all, that is simply a visualizer (though I agree that it is miscalled as being "quality control" it is not really controlling quality it merely plots it)

Also 99% paired does not make sense either. Either all your data is paired or none.

ADD REPLY
0
Entering edit mode

Thanks istvan. I was talking about the tool FAQCs which is also a tool for adapter and quality trimming .

ADD REPLY
0
Entering edit mode

Hello, it looks like you're not taking Istvan's request to not add new answers seriously. Please read our how-to posts and please give a little more thought before creating new answers in the future.

ADD REPLY
9
Entering edit mode
8.5 years ago

Trimmomatic combines a bunch of regularly performed quality control steps in one go.

In your example command, you give it a paired end fastq file with phred33 quality encodings, then clip for the adapters in TruSeq3-PE.fa with a maximum of 2 mismatches with an overlap score of 30 or 10 for single reads. Then, all reads shorter than 25 are discarded (MINLEN:25), and drop all reads where the average quality is below 20 (AVGQUAL:20).

Personally, I prefer a sliding window approach - I often see reads where the quality scores worsen towards the end, it's rare to have a read where the average quality for all bases is below 20, normally the first 2/3 are pretty good. Using a sliding window approach cuts off bad ends.

I'd use (after making sure the reads are actually PHRED33 quality encoded)

PE -phred33 C95VLANXX-2046D-01-01-01_L003_R1.fastq C95VLANXX-2046D-01-01-01_L003_R2.fastq N6_F_P.fastq N6_F_U.fastq N6_R_P.fastq N6_R_U.fastq ILLUMINACLIP:adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20  MINLEN:25

The order of arguments is important - MINLEN25 before the SLIDINGWINDOW approach would mean that it would only check the length of reads before trimming, MINLEN25 after the SLIDINGWINDOW approach means that it will discard reads that after trimming are too short.

If you don't know your adapters, you can run FASTQC on your library - overrepresented k-mers can be adaptersequences. I've also just added all adapters in Trimmomatic's folder of adapters into one file and ran it with that one. There is no distinction between palindromic mode and simple mode - palindromic mode is used for paired end reads, simple for single end reads.

You can use unpaired reads depending on your task - I'd use them for genome assembly since more data is better, but I wouldn't use them for SNP-calling as an alignment of a single read is less precise than an alignment of a read pair.

ADD COMMENT
0
Entering edit mode

Hi,

I have a confusion, why does the Trimmomatic tool at palindrome mode remove the part of adapter from the forward read and drop the reverse read? Doesn't it try to remove the same from the reverse read & keep it instead of removing?

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6