Question

ABYSS kmer coverage

1

Entering edit mode

7.9 years ago

chrisbala10 ▴ 10

First post to biostars... hope i do this right..

Hi All,

I've been fighting a bit with a an attempt at a genome assembly.

Based on reading this forum, I suspect there is an issue with the raw data, but I just want to make sure I've understood everything. Fastqc looks ok. There are some over-represented kmers, but I don't think that is the problem.

The problem seems to be I seem to have an excess of rare kmers (which seemingly indicates sequencing errors). Both ALLPATHS and ABYSS seem to be telling the same story (as they should!). But I am not sure why. I past the first few lines of the coverage.hist from ABYSS below.

These data have been quality trimmed using trim_galore. One relevant post I found suggests using QUAKE for correction. Is that still my next step? Or is it possible these data are fundamentally flawed? I guess one question is, what causes an excess of rare kmers if the quality scores of the data are very high?

Thanks!

Chris

1   1182909997
2   84699927
3   9033507
4   5000923
5   3572263
6   3223489
7   3322965
8   3843986
9   4777951
10  6183795
11  8121580
12  10580154
13  13495444

Assembly genome • 2.3k views

ADD COMMENT • link 7.9 years ago by chrisbala10 ▴ 10

0

Entering edit mode

Looks like every other dataset to me. plot column1 vs column 2 and look at how many peaks you have. You will need to play with x/y axis ranges.

ADD REPLY • link 7.9 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

As Adrian says, kmer frequency histograms (for isolates) usually look like that. I recommend adapter-trimming if you have not already done so, however. You can also check for and remove contaminants, particularly human, which sometimes will contribute to low-frequency kmers.

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k

0

Entering edit mode

oops, messed up plotting, one second...

ADD REPLY • link 7.9 years ago by chrisbala10 ▴ 10

0

Entering edit mode

Ok, here is the full plot. Is this really normal? This is from ~ 1 lane of PE of a bird genome. R isn't labelling my X axis at the moment, sorry..

kmer coverage

ADD REPLY • link 7.9 years ago by chrisbala10 ▴ 10

0

Entering edit mode

That looks fine; typical diploid pattern. It's looks much more sane if you plot it on a log scale.

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k

0

Entering edit mode

thanks... now I know... on to other forms of assembly trouble-shooting!

ADD REPLY • link 7.9 years ago by chrisbala10 ▴ 10

0

Entering edit mode

Looks like you have a diploid pattern like Brian said. Adpater trimming and quality trimming can remove the low frequency k-mers, as well as contaminants if any. Try SPAdes or QUAKE for read error correction before assembly (SPAdes does error correction and assembly as part of their pipeline).

ADD REPLY • link 7.9 years ago by Adrian Pelin ★ 2.6k