Question

Need an advice on quality control of illumina reads

0

Entering edit mode

9.0 years ago

sumudu_rangika ▴ 50

Hi,

I have a dataset generated by illumina MiSeq. FastQC failed the per base sequence quality and sequence length distribution modules.

I did a quality trim using a sliding window 5, step 2 and min quality score of 20. and filtered reads less than 70bp. This removes low quality bases but when look at the sequence length distribution, I noticed that the number of reads of length 300 were reduced from nearly 500,000 to 160,000. Appreciate any advice on this.

Thanks

next-gen • 3.7k views

ADD COMMENT • link updated 8.8 years ago by Biostar 20 • written 9.0 years ago by sumudu_rangika ▴ 50

1

Entering edit mode

Can you provide details (images) of your FASTQC results? Can you elaborate what you mean by "number of reads of length 300 were reduced from nearly 500,000 to 160,000"?

ADD REPLY • link 9.0 years ago by Satyajeet Khare ★ 1.6k

1

Entering edit mode

Thanks a lot. I'm new to this kind of analysis and really appreciate your advices.

The below images show initial per base qualities and sequence lengths distribution

http://www.freeimagehosting.net/upl.php

After trimming and filtering reads <70bp

I hope the images are clear. Am I doing the correct thing? If need any more clarification pls ask me.

Thanks Sumudu

ADD REPLY • link 9.0 years ago by sumudu_rangika ▴ 50

1

Entering edit mode

Thanks! Those look fine to me. I might not have trimmed so aggressively (I usually use a phred cutoff of 5) but otherwise that looks correct.

ADD REPLY • link 9.0 years ago by Devon Ryan 105k

1

Entering edit mode

If you want, you can trim reads beyond 250. You may get better alignment.

ADD REPLY • link 9.0 years ago by Satyajeet Khare ★ 1.6k

score 4 · Answer 1 · 2016-09-01

4

Entering edit mode

9.0 years ago

Devon Ryan 105k

The sequence length distribution will always fail if any reads have been trimmed, just ignore that. It sounds like either you're trimming too aggressively or the sequencing quality just wasn't that good. I have to say that the quality at the end of really long reads tends to decrease a fair bit, so it's sort of expected for trimming to do that. If you're happy with the resulting quality scores then continue with mapping or assembly or whatever. If not, then retrim appropriately. As Satya mentioned, we can't give any other advice without seeing the plots.

ADD COMMENT • link 9.0 years ago by Devon Ryan 105k

0

Entering edit mode

Very much this.I'd also add that various tools are getting better and better with soft-clipping and handling poor quality bases all of the time so depending on your application trimming may not even be that necessary.

ADD REPLY • link 9.0 years ago by DG 7.3k

score 0 · Answer 2 · 2016-09-01

0

Entering edit mode

9.0 years ago

sumudu_rangika ▴ 50

Thank you all very much !!!

ADD COMMENT • link 9.0 years ago by sumudu_rangika ▴ 50