Hi everyone, I'm currently planning to run MSA on all human coronaviruses. I have 8,586 complete genomes/sequences and the FASTA file size is around ~256MB.
May I ask does MAFFT can accept this number of sequences as input? I heard about MUSCLE and T-Coffee before and their accuracy are quite okay, but I'm not sure whether they can handle this large dataset or not. May I ask are there any renowned MSA tools that can handle this much of dataset? My requirements for the MSA tool are having moderate accuracy (if can achieve high accuracy will be even better), short computational time (as short as possible) and of course, can handle my large dataset :'D
I found that there are MSA tools are designed for handling large datasets but I never heard any of them, so if possible can introduce me some MSA tools that are widely used for large datasets and are proved to be useful and reliable?
Thank you in advanced for all the suggestions and explanations and I will appreciate all the responses. :)))
Oh I see, thank you for the suggestions and explanations. Unfortunately, the special option provided by MAFFT only applied for SARS-CoV-2 genomes that are very closely related (~95% identity). My sequences contain other human coronaviruses as well so this special option is not suitable for my data