DSK error on k-mer lengths up to, say, 160
1
0
Entering edit mode
9.7 years ago
s.vandenhurk ▴ 10

I am looking for a k-mer analysis tool that is capable of analysing kmers longer then 31. I thought I hit the jackpot when I read about DKS but so far I have had no luck. Is there any other easy to use tool that is capable of k-mers with a size up to 91+?

I have tried the linux package, and got it working up to 31-mers. then I installed from source and also got it working up to 31-mers. then I read the entire manual (should have done so in the first place). I ran:

rm -Rf CMake* && cmake -Dk4=160 .. && make

and got some errors, 31-mers still worked fine. Then I changed my source installation from:

cmake

to:

cmake -Dk4=150 ..

and got the same errors. so now I am wondering if you could help me fix this situation.

I have gcc version 4.8.2

The errors I get:

[100%] Building CXX object utils/CMakeFiles/dsk2ascii.dir/dsk2ascii.cpp.o
In file included from /home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/gatb_core.hpp:38:0,
                 from /home/SHU/Desktop/DSK/dsk-2.0.2-Source/utils/dsk2ascii.cpp:3:
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/tools/collections/impl/BagFile.hpp: In instantiation of 'gatb::core::tools::collections::impl::BagCountCompressedFile<Item>::~BagCountCompressedFile() [with Item = gatb::core::kmer::impl::Kmer<>::Count]':
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/utils/dsk2ascii.cpp:72:1:   required from here
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/tools/collections/impl/BagFile.hpp:179:189: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 2 has type 'u_int64_t {aka long unsigned int}' [-Wformat=]
         printf("In %llu B  (%llu MB ) Out %llu  B  (%llu MB ) ratio  %f \n",_sizeInput,_sizeInput/(1024LL*1024LL), _sizeOutput,_sizeOutput/(1024LL*1024LL), _sizeInput / (float) _sizeOutput);
                                                                                                                                                                                             ^
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/tools/collections/impl/BagFile.hpp:179:189: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'u_int64_t {aka long unsigned int}' [-Wformat=]
cc1plus: warning: unrecognized command line option "-Wno-ambiguous-member-template" [enabled by default]

from /home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/bank/impl/BankBinary.cpp:20:
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/system/api/Exception.hpp: In constructor 'gatb::core::system::ExceptionErrno::ExceptionErrno(const char*, ...)':
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/system/api/Exception.hpp:140:47: warning: ignoring return value of 'char* strerror_r(int, char*, size_t)', declared with attribute warn_unused_result [-Wunused-result]
             strerror_r (errno, buffer, BUFSIZ);
                                               ^
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/bank/impl/BankBinary.cpp: In function 'bool gatb::core::bank::impl::checkMagic(FILE*)':
/home/SHU/Desktop/DSK/dsk-2.0.2-Source/thirdparty/gatb-core/src/gatb/bank/impl/BankBinary.cpp:54:43: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)', declared with attribute warn_unused_result [-Wunused-result]
     fread (&value, sizeof(value), 1, file);
                                           ^

Error I get on running dsk2ascii:

EXCEPTION: Type 'LargeInt<1>' has too low precision (64 bits) for the required 51 kmer size
DSK K-mer • 3.3k views
ADD COMMENT
0
Entering edit mode

Hello,

Just to be sure: Did you actually manage to build "dsk" with your k4 setting? From the traces you gave, it seems that the compilation worked (got only warnings).

However, I think you have pointed a potential issue in the "dsk2ascii" binary for kmer size >=128. I am going to provide a correction and I will tell you when it's available.

Note that "dsk" itself should work, only "dsk2ascii" has the issue.

ADD REPLY
0
Entering edit mode

dsk itself seems to work just fine, that is true. but dsk2ascii does not. it works perfect with 31mers, but it does not with 32+mers

ADD REPLY
3
Entering edit mode
9.7 years ago
edrezen ▴ 730

Hello,

I put a new version of DSK with a correction for dsk2ascii here : http://gatb-tools.gforge.inria.fr/versions/src/dsk-2.0.3-Source.tar.gz

Can you tell me if it works ?

ADD COMMENT
1
Entering edit mode

It works, thank you so much

ADD REPLY
0
Entering edit mode

I will let you know if it works when I get back to work on monday. I don't have access to the network from home. Thanks in advance

ADD REPLY
0
Entering edit mode

It works up to a kmer size of 101 after that the partitioning step get's stuck at 0% with a -1% memory usage. any ideas?

ADD REPLY
0
Entering edit mode

In order to investigate, could you tell us:

  • what is the size of your bank ?
  • the sequences of the bank have all the same size ? if not, do you have an idea of the min/max sizes of the sequences ?

Is it possible for you to provide the bank ? It would be easier for us to find what happens.

ADD REPLY
0
Entering edit mode

I am not 100% sure, but I believe all sequences in my file are 101 bp long. the fasta file I want to count is 11GB big. Too bad I can't share the actual file because it contains company property.

/Desktop/DSK/dsk-2.0.3-Source/build$ ./dsk -file 'input/location.fasta' -kmer-size 160
[counting kmers]  0    %   elapsed:   0 min 0  sec    estimated remaining:   0 min 0  sec   cpu:   -
[DSK: Collecting stats on read sample   ]  0    %   elapsed:   0 min 0  sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  2    %   elapsed:   0 min 0  sec    estimated remaining: 
etc
[DSK: Collecting stats on read sample   ]  97   %   elapsed:   0 min 19 sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  98   %   elapsed:   0 min 19 sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  99   %   elapsed:   0 min 19 sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  100  %   elapsed:   0 min 20 sec    estimated remaining: 
[DSK: Collecting stats on read sample   ]  100  %   elapsed:   0 min 20 sec    estimated remaining:   0 min 0  sec   cpu:   21.9 %   mem: [ 98,  98,  98] MB 
[DSK: Pass 1/10328094, Step 1: partitioning    ]  0    %   elapsed:   0 min 0  sec    estimated remaining:   0 min 0  sec   cpu:   -1.0 %   mem: [ 98,  98,  99] MB

it remains stuck on this part

ADD REPLY
0
Entering edit mode

Ok, thanks for the information.

In fact, dsk cuts each sequence into kmers, so a sequence of length N will have N-K+1 kmers, where K is the kmer size. In your case, you try to use a kmer size of 160, which may be longer than the sequences of 101 bp. In other words, you should not try to use kmer size bigger than the length of your sequences.

Nevertheless, we have found a flaw in dsk in case there are many sequences of same size and a few sequences of much bigger size. The consequence is that the pass number (see 10328094 in the output you gave) is wrongly computed and may lead to strange behaviors.

Right now, I would suggest to use a kmer size not too big (99 for instance in your case) to be sure that it is less than your sequences length.

ADD REPLY
0
Entering edit mode

yea Im not sure about what I was thinking with trying kmers of bigger then 101. I should map my reads and create contigs and do a kmer analysis on those contigs... thanks for waking me up

ADD REPLY

Login before adding your answer.

Traffic: 1806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6