Question

illumina quality trimming - FASTQC

0

Entering edit mode

8.7 years ago

sumudu_rangika ▴ 50

Hi,

I have few things to clarify.

1) I have a illumuna MiSeq dataset for a parasite genome. Machine itself gave paired-end reads as two separate datasets. one forward(R1) and other reverse(R2). When using FASTQC tool for one set e.g. filtering reads <70bp in R1 dataset, should we consider R1 as paired-end or no?

2) During quality trimming, I tried to adjust the sliding window size and see how per base quality improves. Increasing the sliding window size resulted in more aggressive trimming. I have selected sliding window 5, step 2 and quality<20 and filtered reads less than 70bp.

Appreciate any advice on this.

Thanks

next-gen • 7.3k views

ADD COMMENT • link updated 8.7 years ago by agata88 ▴ 870 • written 8.7 years ago by sumudu_rangika ▴ 50

1

Entering edit mode

Never filter or trim paired reads with a tool that does not support paired reads, or you may get broken pair ordering. And it sounds to me like your trimming to Q20 is too aggressive for most purposes; it increases bias. I recommend use BBDuk for quality operations like filtering and trimming, but whether or not it's advisable to do them at all depends on what you're doing with the data. What's your experiment?

ADD REPLY • link 8.7 years ago by Brian Bushnell 20k

0

Entering edit mode

Its a whole genome sequencing of 8 clinical isolates. I have two separate sets for each, forward set and reverse set.

So do you think quality trimming of forward set and reverse set separately using a tool like FASTQ quality trimmer is not a good option?

ADD REPLY • link 8.7 years ago by sumudu_rangika ▴ 50

1

Entering edit mode

It's not! You will have different number of reads in R1 file and R2 file, which will result in problems with processing eg. mapping. You need to trim "connected" reads from R1 and R2 file.

ADD REPLY • link 8.7 years ago by agata88 ▴ 870

1

Entering edit mode

Assuming you mean the trimmer from FASTX-Toolkit, no that should not be used.

ADD REPLY • link 8.7 years ago by Devon Ryan 105k

0

Entering edit mode

I'm using the local instance of galaxy and FASTQC quality trimmer.

ADD REPLY • link 8.7 years ago by sumudu_rangika ▴ 50

0

Entering edit mode

Do you mean you use FASTX-toolkit? Coz FASTQC produces just the report. The report you showed in your previous post was good. Not sure why trimming or filtering (especially sliding window) is necessary. R1 and R2 are paired end reads. Generally FASTQC profile of paired end reads is similar.

ADD REPLY • link 8.7 years ago by Satyajeet Khare ★ 1.6k

score 1 · Answer 1 · 2016-09-02

1

Entering edit mode

8.7 years ago

agata88 ▴ 870

I would suggest to use Trimmomatic for PE reads. http://www.usadellab.org/cms/?page=trimmomatic

For DNAseq I am using SLIDINGWINDOW:4:30 and MINLEN=30. But it is up to you what quality and read length you set.

In case of FastQC - this tool checks for statistics for one sample eg. R1 or R2. Both files have the same number of reads before and should have after trimming.

So, check the data R1 and R2 separately by FastQC before trimming and after trimming and you'll see how much your data changed after quality cut.

Best,

Agata

ADD COMMENT • link 8.7 years ago by agata88 ▴ 870

0

Entering edit mode

Trimmomatic generate 4 sets of output. If I use it should I consider only two paired sets and ignore two unpaired datasets?

ADD REPLY • link 8.7 years ago by sumudu_rangika ▴ 50

2

Entering edit mode

Your desirable output is in paired sets and that is one you should use for further analysis. The unpaired datasets include all trimmed reads that don't have a pair (because of trimming). It is important to have 4 sets in case of performing too much aggressive trimming which will results in large number of unpaired reads.

ADD REPLY • link 8.7 years ago by agata88 ▴ 870