Dear All, thank you for your time. I have a dataset containing 15,000 sequences. I wish to build a tree and thus my plan was to use BlastClust, a module in the Blast application to cluster them, then use a reference sequence from each cluster to build a crude tree. BlastClust has been running for some time now but I have no idea whether this is going to work or how long it will take.
I was wondering if there are any other ways of going about this with a such a large set of sequences?
Ideally, I wanted to be able to do a sequence alignment and then use that alignment of build a tree (which I agree will be complex with that number of sequences) and then look at the evolution of those sequences.
I tried something called MAFFT to do the sequence alignment, which did not give me any errors but gave me no output.
Any suggestions would be appreciated.
Thank you for your extensive suggestions. I have managed to use CD-HIT but you are correct in that this is not ideal. I managed to use Clustal Omega to build an alignment and I will use FastTree as you have suggested. Although I am not using NGS data, I will look at the paper you have suggested as well. Thank you once again