Hi to all I have a transcriptome. I extracted CDS for all sequences both complete & partial. The amino acid usage results show bias towards particular amino acids. Few amino acids are much more than expected which clearly indicates that certain sequences or family of sequences are highly represented. Are there any tools to cluster sequences based on similarity (not duplicates) to avoid redundancy? I have registered for a tool called Usearch & waiting for a reply, still have no idea whether it could be useful!
I also want to know whether the term sequence clustering is appropriate to use here. Because there are different meanings for this word in bioinformatic analysis.
thank u raghul