I am trying to trim paired end reads using Trim-Galore. I have made sure that the files match based on the total reads processed in the output txt file from trim-galore. One of the files trimmed correctly but when I try some of the others the total written and quality trimmed do not match. I get the following error when trim galore tries to validate the files:
pigz: skipping: /Volumes/Backup_Plus/RNAseq/Trimmed/Sample02.R2_trimmed.fq.gz: corrupted -- invalid deflate data (invalid code lengths set)
Read 2 output is truncated at sequence count: 19058781, please check your paired-end input files! Terminating...
Here is the txt report for read 1 and 2:
Read 1
SUMMARISING RUN PARAMETERS
==========================
Input filename: Sample02.R1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.10
Cutadapt version: 4.4
Python version: 3.11.4
Number of cores used for trimming: 8
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Using Illumina adapter for trimming (count: 25126). Second best hit was smallRNA (count: 9)
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed
This is cutadapt 4.4 with Python 3.11.4
Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC Sample02.R1.fastq.gz
Processing single-end reads on 8 cores ...
Finished in 397.910 s (4.251 µs/read; 14.11 M reads/minute).
=== Summary ===
Total reads processed: 93,605,182
Reads with adapters: 32,888,101 (35.1%)
Reads written (passing filters): 93,605,182 (100.0%)
Total basepairs processed: 9,454,123,382 bp
Quality-trimmed: 6,074,955 bp (0.1%)
Total written (filtered): 9,344,167,021 bp (98.8%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 32888101 times
Minimum overlap: 1
No. of allowed errors:
1-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 26.5%
C: 34.0%
G: 22.3%
T: 17.1%
none/other: 0.0%
Read 2
SUMMARISING RUN PARAMETERS
==========================
Input filename: Sample02.R2.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.10
Cutadapt version: 4.4
Python version: 3.11.4
Number of cores used for trimming: 8
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Using Illumina adapter for trimming (count: 25126). Second best hit was smallRNA (count: 9)
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed
This is cutadapt 4.4 with Python 3.11.4
Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC Sample02.R2.fastq.gz
Processing single-end reads on 8 cores ...
Finished in 410.082 s (4.381 µs/read; 13.70 M reads/minute).
=== Summary ===
Total reads processed: 93,605,182
Reads with adapters: 34,396,551 (36.7%)
Reads written (passing filters): 93,605,182 (100.0%)
Total basepairs processed: 9,454,123,382 bp
Quality-trimmed: 16,184,037 bp (0.2%)
Total written (filtered): 9,331,568,873 bp (98.7%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 34396551 times
Minimum overlap: 1
No. of allowed errors:
1-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 29.5%
C: 30.6%
G: 25.7%
T: 14.2%
none/other: 0.0%
Is this an issue with the quality of the sequencing? Can I override this or is it something else?
Thank you,
Leon
I just checked the files that succeeded with trimming and the quality trimmed and written on those don't match either but they still passed and were validated so not sure if the lack of matching after trimming has anything to do with the error messages.