Right Now I'm currently trying to put 12 whole genomes into a phylogenetic tree. The Genomes were mapped from a reference and they've been converted a couple times (i.e. Bam to Sam, Sam to Fastq, Fastq to fasta), but they look great(we uploaded one to Genious to check if everything was in order). I'm working with FastTree and I'm trying to create a tree with these fasta files, but every time I try uploading one of the genomes, the program will run for a while, but it won't actually do anything. The output tree files are blank and nothing's going into them. The command I'm using is
./FastTree -nt Filename.fasta > myTree
I've seen a similar thread, and I made sure to check that I'm calling the program and that is downloaded and that I'm calling the right files, but i'm still getting a blank file. Any suggestions would be appreciated.
Thanks.
Can you also add a
2>myTree.err
and see if any the err file has anything to say? If there was absolutely nothing on stdout, you might need to try with a different input file and see if that works.is your .fasta an actual FASTA alignment file, or just a load of genome fastas? FastTree takes prealigned fasta's or phylips as input.
It's an actual Fasta file, the first couple lines look like this
>9757X1 GCTACTATAAGAGTTGTCTAGTAATTCTTAGTAGAAAAGAGTTATTAGAG ATATCTTATAGTACGTCTTTAAACTTAGCTACTCTAAGATTAATAGTAGT ATATCTTATAGTACGTCTTTAAACTTAGCTACTCTAAGATTAATAGTAGT
followed by many more lines of A's, T's, C's, and G's
Just to continue the question of jrj.healey...
Apologies for the simple question: Does your multi-sequence FASTA file contain aligned sequences? i.e. all sequences same length, padded out with gap characters (-) to ensure same length.
So A's T's C's G's and gap characters (-).
Pretty sure we might be getting to the nub of the issue here...
You first need to generate a sequence alignment before you can make a tree. If you're attempting to align 12 whole genomes (though you haven't said what they are/how big), this may take a very long time - someone who is more up to date on the state of aligning genomes might know better. I believe MUMmer is capable of aligning whole bacterial genomes via suffix trees.
You may have to appeal to something like MLST instead in order to extrapolate a phylogeny in a reasonable timescale.
They're pretty big files, between 2 and 6 GB. I'm open to trying different software, but we're looking for phylogenetic info for the BFODMAT revision if at all possible, so I don't know if other softwares will output it in the same format. They're also fungal genomes, so I hesitate using bacterial genome phylogeny compilers. FastTree could very well not be suited for such big files, but it would be nice to get it to work just because FastTree will give us the right output without any additional conversion.
Thanks by the way for all the input. I think I might revise my question with everything that you guys are saying. Any other comments before I do so?
They should, I aligned them using samtools. This is my first time at this, so I definitely could have gone wrong, but I think that they've been aligned correctly.
These are genomes you've sequenced yourself?
Samtools is a suite of programme for manipulating Sequence Alignment Map data, but it's not the same thing as multiple sequence alignment - not very helpfully distinguished I grant you!
This typically means you create sequencing read alignments to a reference genome (usually with tools like
bwa
orbowtie
) to check statistics like genome coverage and so on.Multiple sequence alignment just takes a set of sequences and does what it says on the tin, aligns them with one another.
It sounds to me like you're using the wrong input data. You need to take your fastas, and use something like
clustalo
,MUSCLE
or probably more likely something like MUMmer as mentioned that can deal with big, big, sequences (though only pairwise).I strongly suspect it may be impossible to MSA fungal genomes though. Someone else might know better.
When I tried I got
FastTree Version 2.1.9 SSE3 Alignment: 9751X1.fasta Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000 Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1 TopHits: 1.00*sqrtN close=default refresh=0.80 ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
I've tried several different files with the same results. I thought that it might be a problem with FastTree, so I redownloaded the program and nothing changed.
I'd recommend you add that to your question. I'm not the right person to answer your question, but this additional information might help the right person come to a more helpful conclusion.