Entering edit mode
6.7 years ago
juan.crescente
▴
110
Hey,
I have a lot of short sequences (100 / 800 nt) as input file I want to cluster. Every several steps of the scripts that generates the input file, it adds about 1000 sequences and at the end, the file is finally 105M and it's hard to cluster. I want to know of there's a way to do incremental clustering each time sequences are added. I've read the cdhit wiki but found no information about incremental clustering for cd-hit-est. Any suggestions?
The sequences are small transposable elements called MITEs and I need to group them into families
I see that this tool is more for reads, I'll add further descriptions to my post