Entering edit mode
7.3 years ago
l.souza
▴
80
I've tried to make an MSA between three sequences of different species fce rom the same genus, with ~23 million bases, to find motifs related to the virulence of these organisms. I used MAFFT with the fastest algorithm - FFT-NS-1 - but it broke my computer. Then, I tried to run into CIPRES server, but the process was killed after the time limit (72h).
Is there a fastest way to align these huge sequences?
What is your end goal?
Since OP is using MAFFT I am going to guess a MSA.
You are aligning what to what? Please be as informative as possible. You have already asked a couple of question on biostars, by now you should know that we need all details you can provide.
I edited the post. Now, I think this is enough to understand the problem!
Do you need them to be multiply aligned, or would 3 pairwise alignments suffice? If so you can use MUMmer for whole genomes. Though 23MB might still be pushing it.
Are you trying to do this on a personal computer or do you have compute access?
I've tried in my personal computer, in a particular server and in CIPRES. Does MUMmer work only for pairwise alignment?
Yeah I believe so. Kalign is another option for large sequences that can manage an MSA.
I'm not surprised your own PC couldn't handle it, you'll probably need to try and get access to a server of something kind as the process may take a long time and will almost certainly need more resources than you have
There is also LASTZ. According to its author :
What do you mean by "three sequences of different species from the same genus"? Three genes per species? Are you aligning each one of the genes / sequences separately? Or are you concatenating and aligning them all at once? How many species do you have? Have you tried to remove identical sequences?
Three whole genome sequences, each one from different species. I've tried to align them all at once.
Try LASTZ as suggested, or Mauve (with GUI), or LAST. There are innumerable programs suitable for this task. MAFFT may be unfit, if there are rearrangements between species. Are the genomes circular? Do they contigs start/stop at the same position?
I'm gonna try these. They're linear.