DNA sequence clustering tools
1
1
Entering edit mode
8.5 years ago
grayapply2009 ▴ 300

Hi guys, I have about 300,000 sequences stored in a fasta file. I am trying to reduce the redundancy of these sequences. I used CD-HIT-EST to remove the redundancy at 95% similarity threshold and am planning to further remove the redundancy with other tools. I tried tgicl but it seems to be a very old and buggy tool, which didn't work well on my fasta file. I am wondering if there are other DNA clustering tools that serve this purpose. Any recommendations?

clustering cdhit redundancy tgicl • 3.8k views
ADD COMMENT
1
Entering edit mode

What was wrong with using CDHIT? Why not try other tools in the suite or alter your threshold?

ADD REPLY
3
Entering edit mode
8.5 years ago

If I understood you properly then you might want to try with VSEARCH for clustering, de-replication. It is an alternative to commercial USEARCH.

ADD COMMENT

Login before adding your answer.

Traffic: 1906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6