Trimmomatic Removed the Adapter Contamination, but made worse other features!
0
0
Entering edit mode
2.9 years ago
soheil • 0

Dear Biostars, I am new to bioinformatics and just have started to learn computational biology (Whole Exome Analysis). I have gained PE fastq files for data analysis. After opening on FastQC, all the check-marks are green, except Adapter Content which shows illumina Universal adapter contamination for both files. I used the below command line in Trimmomatic to remove the adapters:

java -jar trimmomatic-0.39.jar PE -threads 4 -trimlog KG-99-WES-39-0087-TruSeq32.log KG-99-WES-39-0087-A_1.fastq.gz KG-99-WES-39-0087-A_2.fastq.gz KG-99-WES-39-0087-A_1_paired-OnlyTrueseq32.fastq.gz KG-99-WES-39-0087-A_1_unpaired-OnlyTrueseq32.fastq.gz KG-99-WES-39-0087-A_2_paired-OnlyTrueseq32.fastq.gz KG-99-WES-39-0087-A_2_unpaired-OnlyTrueseq32.fastq.gz ILLUMINACLIP:./adapters/TruSeq3-PE-2.fa:2:30:10

After trimming, check-marks of adapter content turns green for both files; however, per tile sequence quality and sequence length distribution marks of the forward read turned red and yellow, respectively. Also, the sequence length of the reverse read turned yellow. I wonder: 1- Why is this happened? how come it affected on per tile sequence quality? 2- is there any way to turn these features green again? 3- do the filed even need for any trimming at first place? What happened if I use the first data (without trimming) for downstream analysis? or the Trimmed files are better for further analysis?

Sorry, if my post gets a little long.

Trimmomatic WES Contamination Adapter • 1.9k views
ADD COMMENT
1
Entering edit mode

adapter trimming with trimmomatic will remove adapter sequences from the ends of your reads resulting in a heterogenous mix of read lengths in your output which FastQC flags as less than ideal, but is generally not something to worry about.

That said, I usually also add a MINLEN parameter to trimmomatic to ensure that the returned reads are at least a certain length.

As for the reduced base quality, this can also be explained by the trimming, but there may be other factors as well. It is well established that the later sequencing cycles have reduced base quality. When you trim adapter you are reducing the total number of reads at the longer read lengths and potentially over-representing the poorer quality reads which didn't get trimmed.

It will depend on your experiment and project goals, but if you observed overall high base quality scores before trimming then you likely don't have to worry.

ADD REPLY
0
Entering edit mode

Dear jv, thank you very much for the explanations and your detailed response. I would add the MINLEN parameter and do the trimming again with trimmomatic. I have also read somewhere that BBDuk is also a good solution for adapter removing and I want to try trimming with that. Then, I will compare the results to see which works more efficient. just one point in your response that I need to ask and get your recommendation if you wish. I think the overall score of the data is high, Below are the pictures of forward and reverse reads of the FastQC reports.

enter image description here

enter image description here

I Wanted to do Whole Exome Analysis. So, do you find that there is no need for trimming?

ADD REPLY
0
Entering edit mode

I wouldn't recommend base quality score trimming for this data, but that is very different from the adapter sequence trimming which you described in your original question and you haven't shown the adapter contamination results from FastQC.

I usually do adapter trimming of my reads before mapping.

The following post has links to some additional forum posts further discussing the matter of adapter trimming:

Is it necessary to trim adaptor sequences?

ADD REPLY
0
Entering edit mode

Yes, as you have mentioned, there is no need for base quality trimming for the data. Thanks for the post, I would definitely go through reading the post. Here is the pic of adapter contamination before adapter trimming:

enter image description here enter image description here

And below are the pictures after trimming the data with the code mentioned in the original post plus MINLEN:36 as you recommended it

enter image description here enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6