Hi,
I have few things to clarify.
1) I have a illumuna MiSeq dataset for a parasite genome. Machine itself gave paired-end reads as two separate datasets. one forward(R1) and other reverse(R2). When using FASTQC tool for one set e.g. filtering reads <70bp in R1 dataset, should we consider R1 as paired-end or no?
2) During quality trimming, I tried to adjust the sliding window size and see how per base quality improves. Increasing the sliding window size resulted in more aggressive trimming. I have selected sliding window 5, step 2 and quality<20 and filtered reads less than 70bp.
Appreciate any advice on this.
Thanks
Never filter or trim paired reads with a tool that does not support paired reads, or you may get broken pair ordering. And it sounds to me like your trimming to Q20 is too aggressive for most purposes; it increases bias. I recommend use BBDuk for quality operations like filtering and trimming, but whether or not it's advisable to do them at all depends on what you're doing with the data. What's your experiment?
Its a whole genome sequencing of 8 clinical isolates. I have two separate sets for each, forward set and reverse set.
So do you think quality trimming of forward set and reverse set separately using a tool like FASTQ quality trimmer is not a good option?
It's not! You will have different number of reads in R1 file and R2 file, which will result in problems with processing eg. mapping. You need to trim "connected" reads from R1 and R2 file.
Assuming you mean the trimmer from FASTX-Toolkit, no that should not be used.
I'm using the local instance of galaxy and FASTQC quality trimmer.
Do you mean you use FASTX-toolkit? Coz FASTQC produces just the report. The report you showed in your previous post was good. Not sure why trimming or filtering (especially sliding window) is necessary. R1 and R2 are paired end reads. Generally FASTQC profile of paired end reads is similar.