kmergenie gives k value larger than read size
1
0
Entering edit mode
10.1 years ago
Illinu ▴ 110

Hello,

kmergenie is supposed to support different libraries in the same run. The manual states to include all paired-end libraries that will be used for the assembly. I calculated the best k value for 2 pe libraries, one has reads 90-99bp and the other one 290-299bp. The best k value is 103 which is not possible because it is larger than the smaller read.

Any ideas?

kmergenie libraries • 2.9k views
ADD COMMENT
0
Entering edit mode
10.1 years ago
Rayan Chikhi ★ 1.5k

Hi,

Yes, perhaps you have so much coverage on your 290bp library, that using it alone is sufficient to get a very good assembly (with high k), than setting a low k just for the sake of using the 90bp library. Could you try it?

Rayan

ADD COMMENT
0
Entering edit mode

Hi Rayan,

The funny thing is that when I run kmergenie only with the 'larger' pe library I get a best k of 81...

ADD REPLY
0
Entering edit mode

Ohh that is odd. Could you please send me both HTML reports?

ADD REPLY
0
Entering edit mode

Hi Rayan, when I run kmergenie in the cluster the html report does not generate. I tried running it in my desktop but it takes forever. Any alternative?

ADD REPLY
0
Entering edit mode

It might be sufficient to copy-paste here the .dat file, and if possible, send me the .histo/.pdf files to kmergenie@cse.psu.edu, could you do that please?

To get reports, you can contact your cluster administrator, to ask him to install ghostscript. Kmergenie uses it to generate reports on machines where X is not running, i.e. clusters.

ADD REPLY
0
Entering edit mode

I sent you everything by email. Thanks

ADD REPLY
0
Entering edit mode

I've replied to Illinu by email, but let me copy my response here if anyone's interested. Also note that his organism is diploid.

Thanks much for the data, it's very interesting.

It seems that for the long reads alone, a k value of 180 would work as well as short+long reads. To see this, notice that the histogram (.pdf) of long reads at k=180 looks very similar to the short+long reads histogram at k=180. However in the former, Kmergenie failed to fit the diploid model to it, hence could not predict the number of genomic kmers.

Anyhow, I think that k=81 prediction for the long reads alone is probably not the best here.

It seems that the diploid fit in Kmergenie could be improved to handle this dataset, but I don't really know how right now.

Anyhow, a best k value longer than the smaller library read size is still very likely here.

ADD REPLY

Login before adding your answer.

Traffic: 1684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6