Entering edit mode
8 months ago
SilhouetteQ
•
0
Hi, I am analyzing ATAC-Seq PE data using Trimmomatic-0.39,
TrimmomaticPE: Started with arguments:
-threads 16 -phred33 ./data/R1.fastq.gz ./data/R2.fastq.gz R1_paired.fq.gz R1_unpaired.fq.gz R2_paired.fq.gz R2_unpaired.fq.gz ILLUMINACLIP:./NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 31710731 Both Surviving: 14986357 (47.26%) Forward Only Surviving: 16327631 (51.49%) Reverse Only Surviving: 48503 (0.15%) Dropped: 348240 (1.10%)
TrimmomaticPE: Completed successfully
As can be seen in the results, the amount of forward unpaired reads is overwhelmingly high. What can be the reason for that?
Thank you.
Interestingly, when I didn't provide the adapter file, the number of unpaired forward reads was significantly reduced:
Is it indicating that I used the wrong adapter file? I applied fastqc on my raw data and noticed that they were Sanger / Illumina 1.9 encoding and contaminated with Nextera Transposase sequence, so I assumed I should use the adapter file NexteraPE-PE.fa found in Trimmomatic.
Still see the file here
ILLUMINACLIP:./NexteraPE-PE.fa:2:30:10
, unless that is an empty file.The first example used the adapter file NexteraPE-PE.fa, but it generated more than 50% unpaired data......
The content of NexteraPE-PE.fa is:
Ideally you will know which adapters to use but if you don't then programs like
fastp
can auto-detect them. You could also usebbduk.sh
with its includedadapters.fa
for scanning and trimming. You could also detect them usingbbduk.sh
see --> Identify adapter sequences for trimming from Illumina paired end fastq filesIt is possible that the result first time around is correct i.e. you could be using the wrong file in second attempt and thus only a small fraction of reads (that may be similar by chance) are getting trimmed.
Using fastp, I found that the adapter sequence for the forward read is corresponding to Trans2_rc in the NexteraPE-PE.fa, while for the reverse read is matching Trans1_rc. Could that be the reason why the output of trimmomatic is weird?