I am trying to build a suffix tree for millions of reads with Seqan. Since the number of sequences is so big, I need some lazy implementation where only nodes in certain range of levels are retained. Is there any way to do this. Right now this is what I have and it ran out of memory at the construction step. It seems like the tree is not lazily evaluated at all.
int main (int argc, char const * argv[])
{
String<DnaQ> seq;
FragmentStore<> fragStore;
if (argc < 2 || !loadReads(fragStore, argv[1]))
return 1;
for (int j = 0; j < 5; ++j) {
seq = getRead(fragStore, j);
std::cout << seq << std::endl;
for (int i = 0; i < length(seq); ++i) {
std::cout << getQualityValue(seq[i]) << " ";
}
std::cout << std::endl;
}
typedef FragmentStore<>::TReadSeqStore TReadSeqStore;
typedef GetValue<TReadSeqStore>::Type TReadSeq;
typedef Index< TReadSeqStore, IndexWotd<TMyIndex> > TMyIndex;
TMyIndex myIndex(fragStore.readSeqStore);
}
Can you provide a link to the library's source? The SeqAn website has a few different downloads, not sure exactly what you're using.
If they expose a stream interface perhaps you can inject your own layer of buffering. It sounds like your looking for a way to offload the virtual memory pressure. Having the flexibility to inject your own buffering strategy may provide your scenario with the best results.