Question

Why does Phred+ encoding changes after trimming fastq files??

0

Entering edit mode

17 months ago

Bikal • 0

When I did fastqc analysis on my original fastq files, the encoding that I can see in the biostatistics was sanger/illumina 1.9. But when I did the trimmomatic on those fastq files to trim low quality bases followed by fastqc analysis on the trimmed fastq files the encoding in the biostastics was Illumina 1.5. Why is it so? I made sure that the ASCII characters are not messed up with trimmomatic parameters. However, I am surprised to see the change in the encoding. My only concern is will it affect any subsequent downstream analysis??

fastqc fastq phred quality trimming • 645 views

ADD COMMENT • link updated 17 months ago by GenoMax 152k • written 17 months ago by Bikal • 0

0

Entering edit mode

Can you show us the trimmomatic command you ran?

ADD REPLY • link 17 months ago by dsull ★ 7.6k

0

Entering edit mode

I have used the following command for trimmomatic. Note: I am analyzing forward and reverse fastq files separately as they are result of amplification of a gene from several strains of the same mycoparasite collected across different regions. Each of my fastq files has only one read. I am using CROP tool to crop certain regions by trimming off bad quality region from all forward fastq files after looking at the multiqc of the original fastqc files. And similar for reverse fastq files too with only change in CROP parameter. To my surprise fastqc on the trimmed region gave me Illumina 1.5 for some of the samples (not all samples but some of the samples, 30-40% of samples, as in the picture below).

#script
import os
import subprocess

# Define the base directories
forward_dir = "Oli_final_trim_phredscore/paired_F"
base_output_dir = "Oli_phred2"
paired_output_dir_F = os.path.join(base_output_dir, "paired_F")

# Ensure the directories exist
os.makedirs(paired_output_dir_F, exist_ok=True)

# Trimmomatic parameters

minlen = "50"  
trimmomatic_jar_path = "/fp/homes01/myusername/miniforge3/envs/gene_analysis/share/trimmomatic-0.39-2/trimmomatic.jar"

# Iterate over forward read files
for forward_file in os.listdir(forward_dir):
    if forward_file.endswith("_Oli_F.fastq"):
        base_name = forward_file.replace("_Oli_F.fastq", "")

        forward_input_path = os.path.join(forward_dir, forward_file)

        # Define output paths
        forward_paired_output = os.path.join(paired_output_dir_F, f"{base_name}_Oli_F.fastq")

        # Construct the Trimmomatic command for SE mode
        trim_command = [
            "java", "-jar", trimmomatic_jar_path, "SE", "-phred33",
            forward_input_path,
            forward_paired_output,
            f"MINLEN:{minlen}", "HEADCROP:22", "CROP:250"
        ]

        # Execute the command
        print(f"Trimming forward file: {forward_file}")
        subprocess.run(trim_command)

print("All forward file trimming operations are complete.")

Pic one represents fastqc of a sample before trimmomatic. Second picture represents fastqc of the same sample after trimmomatic.

fastqc of a sample before trimmomatic fastqc of same sample after trimmomatic

ADD REPLY • link updated 17 months ago by GenoMax 152k • written 17 months ago by Bikal • 0