Need Help: output of Mafft aligner
1
0
Entering edit mode
5.4 years ago
shiv ▴ 10

Hi,

I am using mafft aligner for multiple sequences alignment by command line. I am taking output in clustalw format, but the problem is that sequence identifiers are longer than 14-15 characters for every sequences (I have to keep them as they are) and mafft returns only till 14th characters for each ids and I want complete identifiers name in output file. Is there any option to get the full id in mafft output or something I missed in tutorial ???

Thanks in advance !!

alignment • 3.5k views
ADD COMMENT
0
Entering edit mode

Use a different output format. A number of formats have a hard limit on ID characters. Clustal format is similar to strict phylip which has a hard limit of 16 characters.

I’d advise switching to an aligned fasta or something. Almost every tool can accept the latter.

ADD REPLY
0
Entering edit mode

Hi, Thanks for reply!

I need a output in clustal format only.. What should I do ??

ADD REPLY
0
Entering edit mode

Can you not simply edit the output file, which should be plain text (?)? You could use a regex to alter the sequence identifiers via, for example, sed. Make it so that it matches beginning of line (^)

ADD REPLY
0
Entering edit mode

Hi, Thanks for your suggestion. I used ClustalO and it solved my issue.

Thankyou so much !!!

ADD REPLY
2
Entering edit mode
5.4 years ago
Joe 21k

What tool are you using that is that restrictive? The only thing you can really do in this case is create a set of new identifiers and a mapping between new and old, then use a text replacement approach.

e.g. $ cat mymapfile.csv

Old_long_Identifier_Alpha,IDAlpha
Old_long_Identifier_Beta,IDBeta
....

What mappings you use obviously depends how uniquely identifiable your headers are to start with.

You could achieve this with something like:

while IFS=$"," read -r -a array ; do sed -i.bak "s/${array[0]}/${array[1]}/g" clustalfile.clw ; done < mapfile.csv

Be aware, that you will also need to pad the whitespace to ensure the sequence columns remain in their original space. If you can avoid the need for a clustal file though I strongly advise you take a different approach. Manually editing strict files can be very tedious and error prone.

Othewise, consider using something like ClustalO (a new version), which outputs clustal files without the need for short IDs.

ADD COMMENT

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6