Hi,
I was wondering what is best and If what I am doing so far is wrong.
First, I was considering following the lastz + multiz pipeline that seems to be the best/more accepted approach. However, I am having issues after performing the primary pairwise alignment between the reference genome and my sequences of interest. The reference genome that I am using is the same as the one used for the multi-species alignment to which I want to align my sequences afterwards. I've used lastz to do the pairwise alignment.
The question is, how can I add the pairwise alignment to the already constructed alignment with all the species? When I see how Roast and Multiz work, it seems that they use each pairwise alignment of all the species to be able to construct the full alignment following a tree. It doesn't feel like is the best approach for what I am trying to accomplish. Is there another way of doing this without having to split the multi-species alignment into pairs?
Second, I was considering using MAFFT to directly align my sequences to the full MAF alignment with all the species. However, the alignment is splitted into chromosomes. Is it better to do it like that? or better to concatenate all the chromosomes, convert it into FASTA and then feed it into MAFFT?
The end goal is to get the conservation scores of my sequences and the other species.
Thanks in advance!
Hi Carla, could you resolve this? I am trying the same as you I think, I downloaded the 100 vertebrate genome form UCSC and I am trying to obtain that alignment as multifastas so I can later add another set of cds form another animal. I was looking then to search for orthologous between animal 101 and the 100 multiz dataset. Any instructions would be welcome. Thanks.
Hi Carla, I am facing a similar issue. Do you find a good solution?