Entering edit mode
10 months ago
Арсений
•
0
Hello everyone, I have a reference sequence, chloroplast genome, that I need to align against 124 other chloroplast genomes, and then need LASTZ to produce rdotplot files that I can further use to construct plots.
Would it be sufficient to provide my reference genome as a target (meaning as a first argument when using lastz from terminal or script) and a single .fasta file containing all 124 genomes as query?
Thank you in advance!
You may want to consider using a program like Cactus, which is designed for aligning multiple genomes.
Any time a task requires alignment of more than 2 long sequences (even a chloroplast genome), my advice is always: think twice.
Multiple sequence alignment of long sequences is functionally impossible currently. Pairwise alignments can work OK, though in my experience LASTZ doesn't do a very good job so maybe consider
mummer
. To your point specifically, I don't believe in a multiple alignment any sequence is treated as 'special' in that it is a reference, instead they're all compared against each other and then the guide trees dictate their final ordering etc. I doubt LASTZ is any exception, but I stand to be corrected.In your case that would mean 124 vs 123 pairwise alignments which you may be able to do something downstream with, but that depends on your question. There's almost certainly a better way to address it. Alternatively, if you truly do have one specific sequence you consider a reference, then you could do a 123 vs 1 multiple pairwise alignment and just re-concatenate these in to an MSA format if you absolutely need that.
Generally, any time the question of aligning multiple long sequences crops up the answer is usually to consider kmer sketches or similar instead.