File format conversion to clustal using biopython
1
0
Entering edit mode
6.3 years ago
mdsiddra ▴ 30

I am using biopython for converting my aligned sequence file from one format to another using "AlignIO.write" function. The thing I want to know is that,

  1. When I convert a sequence file given in phylip format to a file of clustal format, the resulted clustal file is of version CLUSTAL X (1.81) while I want an output in the form of CLUSTAL W / CLUSTAL 2.1 or so (higher version of clustal).
  2. The resulted clustal file does not include the line in the alignment with symbols ('*','.',':') for the amino acids in the sequence file.

This is the resulted file I am getting.

CLUSTAL X (1.81) multiple sequence alignment


Canis           REWSSARPERSKGRRKPVDAAAVSAVQTSQTSSDVAVSSSCRSMEMQDLTSPHSRLSGSS
Mus             -------------------------------------------MEMQDLTSPHSRLSGSS
Rattus          -------------------------------------------MEMQDLTSPHSRLSGSS

While this is the file format I want to get, including the symbols line indicating the similarity of the residues.

Canis           REWSSARPERSKGRRKPVDAAAVSAVQTSQTSSDVAVSSSCRSMEMQDLTSPHSRLSGSS
Mus             -------------------------------------------MEMQDLTSPHSRLSGSS
Rattus          -------------------------------------------MEMQDLTSPHSRLSGSS
                                                           *****************

Can I do this with biopython or I have to use some other method or function??

biopython • 2.6k views
ADD COMMENT
0
Entering edit mode

Hello mdsiddra,

could you please also provide an example of the phylip input file?

fin swimmer

ADD REPLY
0
Entering edit mode

Yes, This is how a phylip file look like:

14 327
Zebrafish  LELQGEESDL DFRLSLNGKE DLLDTGQSLS SCGVVSGDLI SVILPASLEE
Fugu       LELQGEEAET EISLSLNGSE PLEDTGQTLA SCGIVSGDLI RVALIRALMA
Chicken    LELEGAESDT EFSITLNGKD ALTEDEKTLA SYGIVPGDLI CLLLEEDLPP
Zebra      SMTEGNRSDT AFSVTLNRKD ALTEDQKTLA SYGIVSGDLI CLLLEEDLPP

           TQSSAAAHGG SHHVQEDQVD QQQECVDLQQ DDQQQQQEQV CAAAPPLLCC
           ADPDRADDGG GHAVAMNQVS QEAKLPDASG ADSDQAPGPA ASCWEPMLCS
           PSSSPPSLLT PKRQNEQVDS RAGSSLEFPS GPEDVDLEEG SYPSEPMLCS
           PPATPAPLLT PNGQNEQVDE RAGSSLEFPS GPEDADLEEG SYPSEPMLCS
ADD REPLY
0
Entering edit mode

Hi,

I think seqret from EMBOSS can do the job. You can set the output format as clustal.

ADD REPLY
0
Entering edit mode

I don't want it this way. As I am using python/biopython codes, so I wish to use some source code for this purpose.

ADD REPLY
0
Entering edit mode

What version of Biopython are you on?

ADD REPLY
0
Entering edit mode

Python 3.6 and biopython 1.72

ADD REPLY
1
Entering edit mode
6.3 years ago
Joe 21k

According to the documentation, BioPython does not yet support writing to Clustal 2 formats.

You can try scripting it yourself, or simply realign with clustal and output the format directly.

ADD COMMENT
0
Entering edit mode

alright , thankyou for response.

ADD REPLY
0
Entering edit mode

In case anyone is looking to do this in a scripted manner, I too Joe's suggestion "You can try scripting it yourself" already and made it so the symbols get added to a Clustal alignment that lacks them. See the script calculate_cons_for_clustal_protein.py described here and includes links to a demonstration Jupyter notebook that can be run directly in your browser via MyBinder.

ADD REPLY

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6