Entering edit mode
8.5 years ago
grayapply2009
▴
300
Hi guys, I have about 300,000 sequences stored in a fasta file. I am trying to reduce the redundancy of these sequences. I used CD-HIT-EST to remove the redundancy at 95% similarity threshold and am planning to further remove the redundancy with other tools. I tried tgicl but it seems to be a very old and buggy tool, which didn't work well on my fasta file. I am wondering if there are other DNA clustering tools that serve this purpose. Any recommendations?
What was wrong with using CDHIT? Why not try other tools in the suite or alter your threshold?