any tool for searching duplicated or redundant sequences in a database?
2
0
Entering edit mode
2.6 years ago

Hello I'm building a prokaryotic protein database and I have used different sources of sequence databases, its likely the fact that on my new database more than 1 repeated sequence is present. Is there any tool for estimating sequence similarity on a single fasta file (my database)?

Thank for your time

fasta database • 909 views
ADD COMMENT
2
Entering edit mode
2.6 years ago
GenoMax 147k

cd-hit (LINK) or MMseq2 cluster (LINK) can both help generate non-redundant sequences. In fact NCBI is now using mmseq2 to cluster nr for their web version.

ADD COMMENT
0
Entering edit mode
2.4 years ago
Hugo ▴ 380

You can use SEDA (https://www.sing-group.org/seda/). The "Remove Redundant Sequences" operation (https://www.sing-group.org/seda/manual/operations.html#remove-redundant-sequences) allows to do this.

ADD COMMENT

Login before adding your answer.

Traffic: 1721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6