Entering edit mode
9.5 years ago
kutubjoy
•
0
I have a fasta file containing many protein sequences with identifier. How can I remove the redundant sequences using HMMscan
I have a fasta file containing many protein sequences with identifier. How can I remove the redundant sequences using HMMscan
I wrote a perl script to do that. It is called remove_duplicates and you can find it here:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Why use HMMscan to remove redundant sequences? There are a gazillion deduplication tools out there - just search the forum for FASTA deduplication.
Hello guys
I have more than 70,000 protein sequences come from 65 animal species, most of them are TFs. So some of them might be homologous. I like to use CD-HIT to remove the them. But which similarity threshold should I use?
Any suggestion?
Here is my free program on Github Sequence database curator (https://github.com/Eslam-Samir-Ragab/Sequence-database-curator)
It is a very fast program and it can deal with:
It can work under Operating systems:
It also works for:
Best Regards
I see that you've created a
Tool
type post for your tool. Please do not spam threads with ads for your tool.