Entering edit mode
8.1 years ago
Picasa
▴
650
Hi,
I used kmergenie and sga preqc to evaluate my data before assembly.
However my graphs of kmer distribution are a bit weird; I am not sure how to interpret it.
I know that k-mers with low count typically contain sequencing errors and I should have a peak somewhere.
But here I have no peak, do you have a clue about what is going on ?
Sga preqc
Kmergenie
Did you by any chance pre-filter your data by quality? Usually the absence of peaks indicates too less coverage for your species or contamination in your samples.
If you did filter it, just try to run it without filtering or leniency while trimming by quality values (something like q=10 instead of the usual 20 or 30)
I used Trimmomatic to filter ma data with Q>30 and min(length)=40.
Those graphs are the raw reads; However I discard only 5% after trimming step so the graphs are quite close for trimmed data.
If they are for the raw-reads then probably it is just the coverage problem i.e. you need much more coverage to get your species sequenced. Try to check how much of coverage you might have with the (TotalBases/GenomeSize).
If you have run these on trimmed reads, your min-len=40 and the kmer=51, which means a significant amount of data might be lost, so just increase the min-length to 52. Q>30 is already too strict.