Question

Kmer Content in FastQC failed

2

Entering edit mode

9.2 years ago

hurtc.stri ▴ 20

Hi All,

I am completely new to NGS analysis. I just received data from a paired-end 125 bp Illumina run. I ran both data files through the FastQC software to check for data quality and received a few warnings/failures. In the graph Both runs failed the kmer content test. In the graph, all of the lines hover around zero until you get the right side of the graph (near 94-96) where all of the lines exponentially increase to about 9. Does this mean that there could be adaptors left over on the 5' end? I'm not sure exactly how to interpret this or what I should do about it. Thank you in advance for any advice/suggestions.

next-gen-sequencing • 26k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by hurtc.stri ▴ 20

Ram · Answer 1 · 2016-01-15

hi,

Do not worry too much about the k-mer plot. If your -

1) Per base seq. qual plot is OK with most of the boxes in the green zone and maybe a couple of towards the end falling below the green

2) Per base seq. content plot has the 4 lines overlapping each other. Sometimes you might see that for the first ~10bases or so, the lines are noisy but from there on they should smoothen out.

3) Per seq. GC content plot has single bell shaped hump (more or less) and not two or more. Small shoulders are ok

4) If you see adapters in the Adapter content plot, do adapter removal (and maybe trim reads from the end as well for getting rid of low qual bases) and then redo FastQC.

If all above 3 plots are on those lines and adapters have been removed, you are good to go.

I do not know for what reasons k-mers show enrichment but if its towards the end of read length then it could be due to low qual bases / adapters.

Caveat - The above generalizations are for WES/ WGS/ RNA-seq data. If you have low volume data like from amplicon sequencing you would see a lot of noise as per FastQC. Like the GC plot might have multiple shoulders but this would be due to low diversity in your data (a handful of genes).

score 1 · Answer 2 · 2016-10-19

1

Entering edit mode

8.5 years ago

alex.rubinsteyn ▴ 190

I have the same problem, also with 125bp paired end Illumina sequencing. The issue appears to be a shorter than desired fragment size distribution leading to adapter read-through on ~10k reads.

ADD COMMENT • link 8.5 years ago by alex.rubinsteyn ▴ 190

0

Entering edit mode

It happens. Use an appropriate trimming program and go on to the next step.

ADD REPLY • link 8.5 years ago by GenoMax 150k

Ram · Answer 3 · 2016-01-15

0

Entering edit mode

9.2 years ago

dally ▴ 210

This has some good examples: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Then you can look into the following: Trimmomatic, Fastx-toolkit to trim data.

Goodluck!

ADD COMMENT • link updated 5.3 years ago by Ram 45k • written 9.2 years ago by dally ▴ 210

0

Entering edit mode

Great tutorial - thanks for the link. Now onto Trimmomatic!

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 9.2 years ago by hurtc.stri ▴ 20