ABYSS kmer coverage
0
1
Entering edit mode
7.9 years ago
chrisbala10 ▴ 10

First post to biostars... hope i do this right..

Hi All,

I've been fighting a bit with a an attempt at a genome assembly.

Based on reading this forum, I suspect there is an issue with the raw data, but I just want to make sure I've understood everything. Fastqc looks ok. There are some over-represented kmers, but I don't think that is the problem.

The problem seems to be I seem to have an excess of rare kmers (which seemingly indicates sequencing errors). Both ALLPATHS and ABYSS seem to be telling the same story (as they should!). But I am not sure why. I past the first few lines of the coverage.hist from ABYSS below.

These data have been quality trimmed using trim_galore. One relevant post I found suggests using QUAKE for correction. Is that still my next step? Or is it possible these data are fundamentally flawed? I guess one question is, what causes an excess of rare kmers if the quality scores of the data are very high?

Thanks!

Chris

1   1182909997
2   84699927
3   9033507
4   5000923
5   3572263
6   3223489
7   3322965
8   3843986
9   4777951
10  6183795
11  8121580
12  10580154
13  13495444
Assembly genome • 2.3k views
ADD COMMENT
0
Entering edit mode

Looks like every other dataset to me. plot column1 vs column 2 and look at how many peaks you have. You will need to play with x/y axis ranges.

ADD REPLY
0
Entering edit mode

As Adrian says, kmer frequency histograms (for isolates) usually look like that. I recommend adapter-trimming if you have not already done so, however. You can also check for and remove contaminants, particularly human, which sometimes will contribute to low-frequency kmers.

ADD REPLY
0
Entering edit mode

oops, messed up plotting, one second...

ADD REPLY
0
Entering edit mode

Ok, here is the full plot. Is this really normal? This is from ~ 1 lane of PE of a bird genome. R isn't labelling my X axis at the moment, sorry..

kmer coverage

ADD REPLY
0
Entering edit mode

That looks fine; typical diploid pattern. It's looks much more sane if you plot it on a log scale.

ADD REPLY
0
Entering edit mode

thanks... now I know... on to other forms of assembly trouble-shooting!

ADD REPLY
0
Entering edit mode

Looks like you have a diploid pattern like Brian said. Adpater trimming and quality trimming can remove the low frequency k-mers, as well as contaminants if any. Try SPAdes or QUAKE for read error correction before assembly (SPAdes does error correction and assembly as part of their pipeline).

ADD REPLY

Login before adding your answer.

Traffic: 2569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6