I have a sequence file (in fasta format) which has 12527 sequences and the length of each sequence is 61. I need to construct a phylogeny tree on this sequence. I am using a python package called "biopython" to perform the below steps:
- Use clustal omega command line tool to generate the alignment file.
- Read the alignment file generated and plot the phylogenetic tree.
However, clustal omega throws the following error:
HHalignWrapper:hhalign_wrapper.c:1419: problem in alignment (profile sizes: 1 + 1) (VS_bf95329ccdd7babf5bee3f5b2f5a2f50 + VS_b22382a3e858d9a132f3263009383220), forcing Viterbi
hh-error-code=4 (mac-ram=8000) hhalign:hhalign.cpp:961: Problem Reading/Preparing profiles (len(q)=0/len(t)=0) HHalignWrapper:hhalign_wrapper.c:1447: problem in alignment, Viterbi did not work
hh-error-code=4 (mac-ram=64000) hhalign:hhalign.cpp:961: Problem Reading/Preparing profiles (len(q)=0/len(t)=0) FATAL: could not perform alignment -- bailing out
When I perform clustalw2 on the same sequence, it works but takes too long. Can someone point out what exactly is the cause for the issue ?
It looks like you have empty sequences, make sure your input is ok. Clustal uses pretty simple algorithms to build the tree, if your desired output is the phylogenetic tree I would recommend to use other algorithms.
I'm sure there are no empty sequences. On the other hand, my desired output is phylogenetic tree so can you suggest some algorithms which can scale up to even thousands of sequences?
See for instance ape package in R