How To Interpret The Kmer Enrichment Plot Of A Fastqc Output
3
7
Entering edit mode
11.9 years ago
pmuench ▴ 140

I preprocess my fastq dataset with cutadapt to remove 3' adapters. Because I had problem to align this I took a look on the dataset with FastQC. I am really confused because the FastQC output for my raw dataset (before cutadapt) looks like this:

Relative enrichment over read length

  • is it normal that adapters does't start from the first base on average? On the FastQC output it seems that the adapter starts after the third base?
  • for me it looks like that there is a 5' adapter too (or how the k-mers in position > 20 can be explained?)
  • whats about the k-mer AAAAA? Is this a sequencing error or contamination?

Thanks!

fastqc illumina next-gen • 14k views
ADD COMMENT
9
Entering edit mode
11.9 years ago

Fascinating plot.

This is clearly a small RNA sequencing experiment. That pattern ATGCCGTCT you are seeing is the middle of the Illumina small RNA kit v1.5 adapter:

ATCTCGTATGCCGTCTTCTGCTTG

which is followed by a fake polyA tail designed to work with the RNA-seq kit Bustard no-calls reported as As

Not sure why you are seeing it at the beginning of the sequence like that, perhaps something special was done there like:

barcode-ATGCCGTCT-sequence-ATCTCGTATGCCGTCTTCTGCTTG-fakePolyA

in which case you should trim carefully

ADD COMMENT
0
Entering edit mode

Could you please explain fakePolyA issue more? I just learned I have contamination in my Illumina RNASeq dataset which looks like this: GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAA (variable length of polyA). Thanks a lot!

ADD REPLY
0
Entering edit mode

you should ask someone more familiar with those truseq small rna kits - but I don't see how that polyA is biological if it occurs after the 3' adapter, in purified dna no less. i could also buy the Bustard explanation.

ADD REPLY
4
Entering edit mode
11.9 years ago
Gabriel R. ★ 2.9k
  • is it normal that adapters does't start from the first base on average? On the FastQC output it seems that the adapter starts after the third base?

I am not sure which adapter you are referring to, the one next to the 5' end ? Yes it should start at the beginning, unless there was an issue in priming it.

  • for me it looks like that there is a 5' adapter too (or how the k-mers in position > 20 can be explained?)

That's weird, did you sequence smallRNA or something ? Did you do a gel cut that was very short ?

  • whats about the k-mer AAAAA? Is this a sequencing error or contamination?

When Bustard meets bases with no intensity, it produces an 'A' with quality 0

ADD COMMENT
2
Entering edit mode

Thank you for the answer! This data is from a RNA seq experiment. My information was there is one adapter in the dataset (which I see on the first 5 peaks on the image). But my main question is: What is with the peaks after base 20? Is this a second adapter on the other end which I have to cut with cutadapt seperately?

ADD REPLY
2
Entering edit mode

I am not sure, do you recognize the sequence in your adapter sequence ?

ADD REPLY
1
Entering edit mode

yes, the kmer composition looks like the adapter sequence. But I expected that the adapter is only ligated to the 3' end. I am not sure if I can conclude from this figure that the same adapter is ligated also to the 5' end.

ADD REPLY
0
Entering edit mode
10.4 years ago
rse ▴ 100

What might be the reasons of low read alignment rate?

ADD COMMENT

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6