Question

Quality report by Fastqc, Result Interpretation and Next step parameters

0

Entering edit mode

9.8 years ago

ravi.uhdnis ▴ 220

Hi All,

I ran FastQC software on whole genome sequence (WGS) data of Human sample (with expected coverage 30X), generated from illumina HiSeq platform. It all appears good (green) except: In case of Forward as well as Reverse Reads

warning (Orange) : Per base sequence content and Per base GC content.

Fail (Red) : K-mer content.

I want to run 'Trimmomatic' for 'Trimming' of poor bases. What should be the parameters of Trimmomatic so that it minimize/remove poor reads and K-mer error? I want to show my .html pages of fastqc run but didn't find any way on BioStars ?.

Looking forward for responses as I need them because I am new in NGS data analysis field. Thank you.

genome next-gen-sequencing • 4.8k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.8 years ago by ravi.uhdnis ▴ 220

Ram · Answer 1 · 2015-05-20

2

Entering edit mode

9.8 years ago

arnstrm ★ 1.9k

I wouldn't worry about those warnings unless your first plot (Per base sequence quality) is bad. You also need to check Per base sequence content if you have adapter contamination. I think these are the only 2 things that can be fixed with either trimmomatic/fastx trimmer (or any other utilities). Rest, doesn't matter much.

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.8 years ago by arnstrm ★ 1.9k

2

Entering edit mode

+1

But also would like to suggest to check read length distribution.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.8 years ago by PoGibas 5.1k

0

Entering edit mode

Thank you for your comment. The length distribution seems all good with upright 'V' on length 101, in between 100 to 102, showing maximum reads are of 101 length. What should be the minimum length of a read that I should kept before going to mapping to reference genome step or I can keep all of them?

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.8 years ago by ravi.uhdnis ▴ 220

0

Entering edit mode

Thank you for your comment. 'Per base sequence quality' appears all good (green) in the plot with Q score approx. 32 for forward and approx. 30 for reverse reads, the mean value. There is no adapter contamination as per the fastqc plot although I'll use trimmomatic to remove, if there is any adapter.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.8 years ago by ravi.uhdnis ▴ 220