Dear Team, Curious to know if dedup(https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/dedupe-guide/) can be used for clustering transcriptome assemblies. I have to cluster my assembly with n% identity. When used dedup as follows dedupe.sh in=assembly.fa out=Clustered_Assembly.fa -Xmx100g minidentity=n threads=40, we got the results lightning fast with compared to cd-hit-est. Please comment
This certainly is helping me :)