Question

Ignore Error "Multiple Sequences Found With Same Name" In Clustalw

1

Entering edit mode

11.9 years ago

david ▴ 10

Hi,

I have a python program generating a clustalw2 alignment of about 500 sequences from a fasta file. The names of the sequences correspond to the respective organisms plus the substrate specificity of a given sequence. Therefore quite a few of these names are identical and i get the error message: "Error: Multiple sequences found with same name" and no alignment is generated. Is it possible to ignore this error without having to change all the sequence names?

Cheers David

biopython clustalw • 5.8k views

ADD COMMENT • link updated 7.2 years ago by Biostar 20 • written 11.9 years ago by david ▴ 10

score 7 · Answer 1 · 2012-12-21

7

Entering edit mode

11.9 years ago

Andrzej Zielezinski 11k

The names of the sequences must be unique to do alignment in ClustalW/X.

I would name your 500 sequences as numbers from 0 to 499 and store the original names in a dictionary or a list.

For example:

d = {1: 'Organism1Substrate', 2:'Organism1Substrate' , ..., 499:'Organism2Substrate'}

or:

l = ['Organism1Substrate', 'Organism1Substrate', 'Organism1Substrate', ..]

Once you performed the alignment, just replace the numbers with original names.

ADD COMMENT • link 11.9 years ago by Andrzej Zielezinski 11k

1

Entering edit mode

+1 for this. In the past I have just GREPed the names and added numbers or more information to make them unique, but I like this idea better.

ADD REPLY • link 11.9 years ago by Josh Herr 5.8k

1

Entering edit mode

Agree. Many phylogenetic programs have problems handling fancy sequence names. The horrible case is phylip format (used by RAxML etc) which allows only 10 characters per name. So I always rename the sequences as "s1", "s2", s3"... I don't recommend using 1, 2, 3... because some programs cannot handle numerical sequence names.

ADD REPLY • link 11.9 years ago by qiyunzhu ▴ 430