Hi all
I am trying to build a phylogeny using 1500 bacterial genomes with a concatenated alignment of 1100 genes (aminoacid sequences).
Finding the orthologs and building the alignment was not (too) difficult but when I tried to build a tree out of this monster it becomes a bit complicated... I have been always using RaxML, now switched to RaxML-ng almost exclusively. Usually with these concatenated alignments, I first run modeltest to find the appropriate model for each gene/protein alignment, then concatenate it and then feed it to RaxML to do bootstraps and find a best tree to fit.
I am using a cluster with 64 threads and 500GB RAM. However, for a whole day (23h) RAxML is now stuck at the very first step: "Starting ML tree search with 20 distinct starting trees".
I understand that trying to build a tree with that amount of info may be wishful thinking, but what is the alternative? How do people with even bigger alignments manage? Should I switch to fasttree or IQtree? Does anyone have good experience with those?
for the record here is the command I used:
raxml-ng --all --data-type AA --threads 64 --msa concatenated.phy --model partitions3 --bs-trees 100
Thank you in advance for the advice
Thank you - I admit I got sidetracked by someone who told me that its feasible so I thought, why not, but you have a great point. There is no need to use 1100 genes, I ll make a selection of some housekeeping+other relevant ones and I will make a phylogeny.
I looked a bit in FastTree, the efficiency is impressive but as far as I can see you need to apply either a JTT or LG model in the entire alignment - no option for concatenated and partitions.