Question

Paired End Trimmomatic producing asymmetric paired read files.

1

Entering edit mode

9.3 years ago

Biogeek ▴ 480

Hey guys,

Got some 101 bp paired end illumina reads I recently got back and I'm in the middle of QC. I decided to use Trimmomatic over solexaqa which I normally use. Problem is, my paired 'cleaned' reads outputted from trimmomatic are uneven file sizes and reads. I have used the option with ILLUMINACLIP , keepbothreads ( ":TRUE"). I decided this was best as to my knowledge the Trinity de novo assembler needs paired end data to work best.

The trimmomatic specific code I run (as part of a loop):

java -jar directory/trimmomatic-0.35.jar PE -phred33 -basein $f1 -baseout "$f1"filtered.fastq ILLUMINACLIP: directory/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE HEADCLIP:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Trimmomatic runs as normal, and produces all files. Can I also ask a few ignorant questions. I am correct in assuming headclip removes reads from the 5'? Additionally, after the trimming has succeeded I still have a flagged high kmer content at the start of the 5' where hexamer primer bias usually exists. Any suggestions or just an artifact to ignore?

I am using TruSeq adaptors.I am assuming all sequencing facilities will now work with TruSeq3?

Thanks for helping.

RNA-Seq qc trimmomatic • 11k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.3 years ago by Biogeek ▴ 480

0

Entering edit mode

@Biogeek could you please provide the reference for your usage of keepbothreads ( ":TRUE") and HEADCLIP:10? I cannot find these options on manual, at: http://www.usadellab.org/cms/?page=trimmomatic

ADD REPLY • link 8.8 years ago by dovah ▴ 40

0

Entering edit mode

I believe he meant HEADCROP instead of HEADCLIP.

ADD REPLY • link 8.7 years ago by Ömer An ▴ 270

Ram · Answer 1 · 2016-01-02

0

Entering edit mode

9.3 years ago

GouthamAtla 12k

When you say PE, where is the R2 file in your command ? you gave only $f1 but not $f2. And also, you don't need to mention the path to adapters, if its standard TrueSeq, I guess.

What is TRUE here?

ILLUMINACLIP: directory/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE
___________________________________________________________^^^^

Trimmomatic instructions are pretty clear on their website:

java -jar trimmomatic-0.35.jar PE -phred33 \
    input_forward.fq.gz input_reverse.fq.gz \
    output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
    output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 9.3 years ago by GouthamAtla 12k

0

Entering edit mode

The manual says that if you give the $f1 or first read (fwd), it will automatically detect the second complementary rev read.

as for :8:TRUE this is where the keepbothreads command comes in. I've read on other forums, by adding "TRUE", this will signify I want to keep both reads. Apologies if I'm confused, first time using the software.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.3 years ago by Biogeek ▴ 480

0

Entering edit mode

Additionally, the reason I don't add $f2, is because it returns the error 'unknown trimmer'.

Is there anyway to write this code better so I can still do a loop, rather than processing multiple read pairs individually?

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.3 years ago by Biogeek ▴ 480

0

Entering edit mode

Okay. Can you just run using the standard command given in the manual on one of the samples and see if it works ? Take care of spaces as well.

ILLUMINACLIP:
_____________^

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.3 years ago by GouthamAtla 12k

0

Entering edit mode

Goutham,

My script works fine. I changed a few settings. I read up and I am using the Truseq3-2-PE file for adaptors as Truseq2 refers to older GAII sequencing methods. Fastqc indicated no presence of adaptors anyway, but nevertheless, I went through with this step. Trailing and leading were at 3, but this is not logical, as this just gets rid of poor quality Illumina reads and N's. I've changed these to Q25 - this has improved the reads a lot. I felt Q20 was too lenient and Q30 was too harsh. I've headcropped the sequence by 16bp (final size 85 bp reads) to remove the primer hexamer bias. Sliding window is over 4 bases with an average of Q25. Minlength of sequences to be kept: 30 bp and over. The fastQC results are now looking a lot better. Thanks for the input.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.3 years ago by Biogeek ▴ 480