Hi All,
I ran FastQC software on whole genome sequence (WGS) data of Human sample (with expected coverage 30X), generated from illumina HiSeq platform. It all appears good (green) except: In case of Forward as well as Reverse Reads
warning (Orange) : Per base sequence content and Per base GC content.
Fail (Red) : K-mer content.
I want to run 'Trimmomatic' for 'Trimming' of poor bases. What should be the parameters of Trimmomatic so that it minimize/remove poor reads and K-mer error? I want to show my .html pages of fastqc run but didn't find any way on BioStars ?.
Looking forward for responses as I need them because I am new in NGS data analysis field. Thank you.
+1
But also would like to suggest to check read length distribution.
Thank you for your comment. The length distribution seems all good with upright 'V' on length 101, in between 100 to 102, showing maximum reads are of 101 length. What should be the minimum length of a read that I should kept before going to mapping to reference genome step or I can keep all of them?
Thank you for your comment. 'Per base sequence quality' appears all good (green) in the plot with Q score approx. 32 for forward and approx. 30 for reverse reads, the mean value. There is no adapter contamination as per the fastqc plot although I'll use trimmomatic to remove, if there is any adapter.