Question

DNA sequence clustering tools

1

Entering edit mode

8.9 years ago

grayapply2009 ▴ 300

Hi guys, I have about 300,000 sequences stored in a fasta file. I am trying to reduce the redundancy of these sequences. I used CD-HIT-EST to remove the redundancy at 95% similarity threshold and am planning to further remove the redundancy with other tools. I tried tgicl but it seems to be a very old and buggy tool, which didn't work well on my fasta file. I am wondering if there are other DNA clustering tools that serve this purpose. Any recommendations?

clustering cdhit redundancy tgicl • 4.0k views

ADD COMMENT • link updated 8.0 years ago by Biostar 20 • written 8.9 years ago by grayapply2009 ▴ 300

1

Entering edit mode

What was wrong with using CDHIT? Why not try other tools in the suite or alter your threshold?

ADD REPLY • link 8.0 years ago by Joe 22k

score 3 · Answer 1 · 2016-05-13

3

Entering edit mode

8.9 years ago

Bioinformatics_NewComer ▴ 330

If I understood you properly then you might want to try with VSEARCH for clustering, de-replication. It is an alternative to commercial USEARCH.

ADD COMMENT • link 8.9 years ago by Bioinformatics_NewComer ▴ 330