Question

Incremental DNA clustering using cd-hit-est

0

Entering edit mode

6.7 years ago

juan.crescente ▴ 110

Hey,

I have a lot of short sequences (100 / 800 nt) as input file I want to cluster. Every several steps of the scripts that generates the input file, it adds about 1000 sequences and at the end, the file is finally 105M and it's hard to cluster. I want to know of there's a way to do incremental clustering each time sequences are added. I've read the cdhit wiki but found no information about incremental clustering for cd-hit-est. Any suggestions?

The sequences are small transposable elements called MITEs and I need to group them into families

cdhit clustering cluster dna • 1.5k views

ADD COMMENT • link 6.7 years ago by juan.crescente ▴ 110

score 0 · Answer 1 · 2018-02-28

0

Entering edit mode

6.7 years ago

h.mon 35k

Try clumpify.sh from the BBTools package.

ADD COMMENT • link 6.7 years ago by h.mon 35k

0

Entering edit mode

I see that this tool is more for reads, I'll add further descriptions to my post

ADD REPLY • link 6.7 years ago by juan.crescente ▴ 110