Hi all,
I have fasta sequence of some proteins and I want to convert fasta format to phylip file format to build phylogenetic tree using ggtree. I tried online EMBOSS seqret tool to convert fasta file to phylip format but I got error when i read in ggtree.
my input sequence
>proteinsA
MGDSRDLCPHLDSIGEVTKEDLLLKSKGTCQSCGVTGPNLWACLQVACPYVGCGESFADH
RTDKKPALCKSYQKLVSEVWHKKRPSYVVP
>proteinsB
MTGSNSHITILTLKVLPHFESLGKQEKIPNKMSAFRNHCPHLDSVGEITKEDLIQKSLGT
SHVSFP
>proteinsC
MGDSRDLCPHLDSIGEVTKEDLLLKSKGTCQSCGVTGPNLWACLQVACPYVGCGESFADD
ITTEETMEEDKSQSDVDFQSCESCSNSDRAENENGSRCFSEDNNETTMLIQDDENN
and EMBOSS seqret output is..
3 116
proteinsA MGDSRDLCPH LDSIGEVTKE DLLLKSKGTC QSCGVTGPNL WACLQVACPY
proteinsB MTGSNSHITI LTLKVLPHFE SLGKQEKIPN KMSAFRNHCP HLDSVGEITK
proteinsC MGDSRDLCPH LDSIGEVTKE DLLLKSKGTC QSCGVTGPNL WACLQVACPY
VGCGESFADH RTDKKPALCK SYQKLVSEVW HKKRPSYVVP ----------
EDLIQKSLGT SHVSFP---- ---------- ---------- ----------
VGCGESFADD ITTEETMEED KSQSDVDFQS CESCSNSDRA ENENGSRCFS
---------- ------
---------- ------
EDNNETTMLI QDDENN
But I got error in reading this phylip file...
tree <- read.phylip("emboss_seqret_output.txt")
Error in read.phylip("emboss_seqret_output.txt") :
input file is not phylip tree format...
Could you please help me what is problem with my input file or can you please suggest me to some alternative ways.
Thanks a lot.
I use different tools to build phylogenetic trees. But I also need to convert fasta to phylip.
To convert fasta to phylip: http://sequenceconversion.bugaco.com/converter/biology/sequences/fasta_to_phylip.php
A program for phylogenetic trees: http://www.atgc-montpellier.fr/phyml/
Other useful programs from that site: http://www.atgc-montpellier.fr/index.php?type=pg
Don’t convert fasta to phylip. That tool is steering you wrong. While it is possible to represent the 2 files in a visually similar manner you should not do this as a text manipulation. The input sequences should be fed to an alignment program.
i was able to load phylip file you posted here (output from emboss seqret) using
read.phylip
function fromphylotools
. I used both.phy
and.txt
extension. In either case, I didn't see a difference. I think you are usingread.phylip
function coming fromtreeio
package.@ Mike. Minor changes I made were to insert an extra space between sequence IDs and sequences, removed extra line between very 1st line and next line.Thanks cpad0112, yes I can read file in read.phylip function from phylotools but not from ggtree/ treeio. How can I build tree using this phylip file in phylotools.
I guess you have resolved the issue. For future reference, the tool needs sequential phylip format not a interleaved format. It also needs dendrogram information (nexus may be) at the end of phy format file.
You may need to make the file extension “.phy”.
Also, I’m not sure if its just how you’ve copied and pasted, but there isn’t normally a space between the 2 numbers in the first line, and the start of the alignment itself (as least as far as I have seen in the past, and PHYLIP is one of the more strict formats).
The bigger issue here is that you should not be “converting” a fasta to a PHYLIP. A phylip is an alignment file, not just a sequence representation. For your tree to be meaningful at all you need to align the sequences, using something like CLUSTAL or MUSCLE.
It is not copied and pasted file , I downloaded from from EMBOSS seqret result page as per below...
I have also MAFFT (alignment file) file but dont know how to use this file for generate tree.
That confirms my suspicions about the spacing of the first and second lines in your pasted example.
You can try to fix it, but its not the file you should be using. Can you paste what your MAFFT output looks like?
That’s an aligned fasta (though to my eye it looks to be a fairly poor alignment) - proceed with caution.
Most tree building software will be able to accept fasta as an input. Otherwise you have 2 options:
Additionally,
ggtree
is not a tree construction program, it is just for rendering/plotting precalcuated trees. From there documentation apparently it supports “phylip tree format”, not a format I’m familiar with, but still requires a newick representation tree in the phylip with the aligned sequences.I would probably start over from you original fasta, align with MAFFT/Clustal/whatever, output directly as a phylip, then use something like IQTREE to actually calculate the tree itself.
Lastly I would just ask: is this a toy data set for our benefit or have you really only got 3 sequences?
Thanks jrj.healey for your help, I have around 150 protein sequences, this is just toy/example data.
ggtree expects a phylip file with the newick string. The file you have converted using Seqret does not have the newick string.
Please see 'Parser functions defined in treeio' table in the ggtree documentation for more info.
Thanks Sej, thats my problem, how to generate phylip file (phylip alignment + newick string) to plotting in ggtree.
No need. You don’t need the phylip at all, you just need a newick formatted tree, which is the most common output for any phylogenetics tool.
Use a tool like IQTREE, and just take the treefile it gave you. You do not need to do anything else.
see if this is what you want @ Mike :
input:
output:
code: (works with python > 3.5, biopython latest version 1.71):
This doesn’t solve OPs problem because it still contains no dendrogram information.