Question

Removing redundant sequences

0

Entering edit mode

10.0 years ago

kutubjoy • 0

I have a fasta file containing many protein sequences with identifier. How can I remove the redundant sequences using HMMscan

sequence alignment blast • 5.8k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.0 years ago by kutubjoy • 0

1

Entering edit mode

Why use HMMscan to remove redundant sequences? There are a gazillion deduplication tools out there - just search the forum for FASTA deduplication.

ADD REPLY • link 2.4 years ago by Ram 45k

0

Entering edit mode

Hello guys

I have more than 70,000 protein sequences come from 65 animal species, most of them are TFs. So some of them might be homologous. I like to use CD-HIT to remove the them. But which similarity threshold should I use?

Any suggestion?

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 8.5 years ago by Kurban ▴ 230

0

Entering edit mode

Here is my free program on Github Sequence database curator (https://github.com/Eslam-Samir-Ragab/Sequence-database-curator)

It is a very fast program and it can deal with:

Nucleotide sequences
Protein sequences

It can work under Operating systems:

Windows
Mac
Linux

It also works for:

Fasta format
Fastq format

Best Regards

ADD REPLY • link 8.3 years ago by Eslam Samir ▴ 110

1

Entering edit mode

I see that you've created a Tool type post for your tool. Please do not spam threads with ads for your tool.

ADD REPLY • link 8.3 years ago by Ram 45k

score 0 · Answer 1 · 2015-07-08

0

Entering edit mode

9.9 years ago

nterhoeven ▴ 120

I wrote a perl script to do that. It is called remove_duplicates and you can find it here:

https://github.com/nterhoeven/sequence_processing

ADD COMMENT • link 9.9 years ago by nterhoeven ▴ 120

Ram · Answer 2 · 2015-07-09

0

Entering edit mode

9.9 years ago

h.mon 35k

As Ram suggested, there are several tools to do this, one is cd-hit.

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 9.9 years ago by h.mon 35k

0

Entering edit mode

This is an old question :)

ADD REPLY • link 2.5 years ago by Ram 45k

1

Entering edit mode

Not for my standards, I even replied to 3-4 year old questions ;-)

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 9.9 years ago by h.mon 35k