Looking for phylogenetic python api
1
0
Entering edit mode
7.0 years ago
virus_n00b • 0

I have a sequence file (in fasta format) which has 12527 sequences and the length of each sequence is 61. I need to construct a phylogeny tree on this sequence. I am using a python package called "biopython" to perform the below steps:

  1. Use clustal omega command line tool to generate the alignment file.
  2. Read the alignment file generated and plot the phylogenetic tree.

However, clustal omega throws the following error:

HHalignWrapper:hhalign_wrapper.c:1419: problem in alignment (profile sizes: 1 + 1) (VS_bf95329ccdd7babf5bee3f5b2f5a2f50 + VS_b22382a3e858d9a132f3263009383220), forcing Viterbi
        hh-error-code=4 (mac-ram=8000) hhalign:hhalign.cpp:961: Problem Reading/Preparing profiles (len(q)=0/len(t)=0) HHalignWrapper:hhalign_wrapper.c:1447: problem in alignment, Viterbi did not work
        hh-error-code=4 (mac-ram=64000) hhalign:hhalign.cpp:961: Problem Reading/Preparing profiles (len(q)=0/len(t)=0) FATAL: could not perform alignment -- bailing out

When I perform clustalw2 on the same sequence, it works but takes too long. Can someone point out what exactly is the cause for the issue ?

alignment sequencing • 2.1k views
ADD COMMENT
0
Entering edit mode

It looks like you have empty sequences, make sure your input is ok. Clustal uses pretty simple algorithms to build the tree, if your desired output is the phylogenetic tree I would recommend to use other algorithms.

ADD REPLY
0
Entering edit mode

I'm sure there are no empty sequences. On the other hand, my desired output is phylogenetic tree so can you suggest some algorithms which can scale up to even thousands of sequences?

ADD REPLY
0
Entering edit mode

See for instance ape package in R

ADD REPLY
2
Entering edit mode
7.0 years ago
Joe 21k

I would suggest you build the tree with a dedicated tool that scales well first, rather than trying to do everything through python.

The MUSCLE aligner is fast for large numbers of sequences, and then I would probably use something like RAxML to build the tree. If RAxML is too slow (if the sequences are very long or high in number, as yours are, you may encounter issues), try fasttree.

Biopython is best used for manipulating bioinformatics file types, not usually for creating them in the first place.

This is very generalised advice however, so if you can provide more information about your input data and the actual question you’re attempting to answer, we might be able to provide more specific help.

ADD COMMENT
0
Entering edit mode

That's exactly what I needed. Muscle scales well for my problem with -maxiters 2. However, I have a issue in the second part you said "Using fasttree for tree building". MUSCLE outputs a alignment file with header MUSCLE Version number and fasttext is throwing an error for this. Is there any workaround for this?

ADD REPLY
0
Entering edit mode

Off the top of my head, muscle should support all standard output file types. I can’t recall all the details right now and am not near a computer to test, but their online manual should have all the supported output types.

If not, it should be simple to do some standard command line text manipulation to remove the header line if the file is otherwise correct.

ADD REPLY

Login before adding your answer.

Traffic: 1627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6