My kmer distribution is weird

0

Entering edit mode

8.1 years ago

Picasa ▴ 650

Hi,

I used kmergenie and sga preqc to evaluate my data before assembly.

However my graphs of kmer distribution are a bit weird; I am not sure how to interpret it.

I know that k-mers with low count typically contain sequencing errors and I should have a peak somewhere.

But here I have no peak, do you have a clue about what is going on ?

Sga preqc

http://imgur.com/a/muB6k

Kmergenie

http://imgur.com/a/UEt9e

kmer distribution • 2.7k views

ADD COMMENT • link updated 8.0 years ago by Biostar 20 • written 8.1 years ago by Picasa ▴ 650

0

Entering edit mode

Did you by any chance pre-filter your data by quality? Usually the absence of peaks indicates too less coverage for your species or contamination in your samples.

If you did filter it, just try to run it without filtering or leniency while trimming by quality values (something like q=10 instead of the usual 20 or 30)

ADD REPLY • link 8.1 years ago by Rohit ★ 1.5k

0

Entering edit mode

I used Trimmomatic to filter ma data with Q>30 and min(length)=40.

Those graphs are the raw reads; However I discard only 5% after trimming step so the graphs are quite close for trimmed data.

ADD REPLY • link 8.1 years ago by Picasa ▴ 650

0

Entering edit mode

If they are for the raw-reads then probably it is just the coverage problem i.e. you need much more coverage to get your species sequenced. Try to check how much of coverage you might have with the (TotalBases/GenomeSize).

If you have run these on trimmed reads, your min-len=40 and the kmer=51, which means a significant amount of data might be lost, so just increase the min-length to 52. Q>30 is already too strict.

ADD REPLY • link 8.1 years ago by Rohit ★ 1.5k

Login before adding your answer.