Question

How To Interpret The Kmer Enrichment Plot Of A Fastqc Output

7

Entering edit mode

11.9 years ago

pmuench ▴ 140

I preprocess my fastq dataset with cutadapt to remove 3' adapters. Because I had problem to align this I took a look on the dataset with FastQC. I am really confused because the FastQC output for my raw dataset (before cutadapt) looks like this:

Relative enrichment over read length

is it normal that adapters does't start from the first base on average? On the FastQC output it seems that the adapter starts after the third base?
for me it looks like that there is a 5' adapter too (or how the k-mers in position > 20 can be explained?)
whats about the k-mer AAAAA? Is this a sequencing error or contamination?

Thanks!

fastqc illumina next-gen • 14k views

ADD COMMENT • link updated 10.3 years ago by rse ▴ 100 • written 11.9 years ago by pmuench ▴ 140

Ram · Answer 1 · 2013-01-16

9

Entering edit mode

11.9 years ago

Jeremy Leipzig 22k

Fascinating plot.

This is clearly a small RNA sequencing experiment. That pattern ATGCCGTCT you are seeing is the middle of the Illumina small RNA kit v1.5 adapter:

ATCTCGTATGCCGTCTTCTGCTTG

which is followed by a ~~fake polyA tail designed to work with the RNA-seq kit~~ Bustard no-calls reported as As

Not sure why you are seeing it at the beginning of the sequence like that, perhaps something special was done there like:

barcode-ATGCCGTCT-sequence-ATCTCGTATGCCGTCTTCTGCTTG-fakePolyA

in which case you should trim carefully

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 11.9 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Could you please explain fakePolyA issue more? I just learned I have contamination in my Illumina RNASeq dataset which looks like this: GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAA (variable length of polyA). Thanks a lot!

ADD REPLY • link 11.2 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

you should ask someone more familiar with those truseq small rna kits - but I don't see how that polyA is biological if it occurs after the 3' adapter, in purified dna no less. i could also buy the Bustard explanation.

ADD REPLY • link 11.2 years ago by Jeremy Leipzig 22k

score 4 · Answer 2 · 2013-01-16

4

Entering edit mode

11.9 years ago

Gabriel R. ★ 2.9k

is it normal that adapters does't start from the first base on average? On the FastQC output it seems that the adapter starts after the third base?

I am not sure which adapter you are referring to, the one next to the 5' end ? Yes it should start at the beginning, unless there was an issue in priming it.

for me it looks like that there is a 5' adapter too (or how the k-mers in position > 20 can be explained?)

That's weird, did you sequence smallRNA or something ? Did you do a gel cut that was very short ?

whats about the k-mer AAAAA? Is this a sequencing error or contamination?

When Bustard meets bases with no intensity, it produces an 'A' with quality 0

ADD COMMENT • link 11.9 years ago by Gabriel R. ★ 2.9k

2

Entering edit mode

Thank you for the answer! This data is from a RNA seq experiment. My information was there is one adapter in the dataset (which I see on the first 5 peaks on the image). But my main question is: What is with the peaks after base 20? Is this a second adapter on the other end which I have to cut with cutadapt seperately?

ADD REPLY • link 11.9 years ago by pmuench ▴ 140

2

Entering edit mode

I am not sure, do you recognize the sequence in your adapter sequence ?

ADD REPLY • link 11.9 years ago by Gabriel R. ★ 2.9k

1

Entering edit mode

yes, the kmer composition looks like the adapter sequence. But I expected that the adapter is only ligated to the 3' end. I am not sure if I can conclude from this figure that the same adapter is ligated also to the 5' end.

ADD REPLY • link 11.9 years ago by pmuench ▴ 140

score 0 · Answer 3 · 2014-07-28

0

Entering edit mode

10.3 years ago

rse ▴ 100

What might be the reasons of low read alignment rate?

ADD COMMENT • link 10.3 years ago by rse ▴ 100