whole-genome MSA of 1000 vertebrate sequences

0

Entering edit mode

2.7 years ago

Sergey • 0

I'm looking for the most suitable pipeline to perform a whole-genome multiple sequence alignment (MSA) of around 1000 vertebrate species. The goal is to identify conserved elements (and yes, I need this number of species).

As far as I know, the maximum number of sequences used in publicly available whole-genome MSA of vertebrate species is 100 (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=multiz100way).

I'm new to MSA, so I would like to know which pipeline would be the most efficient in my case and how much time and computational resources may I need (I have a slurm cluster, so parallelization is preferred).

Would it be OK to perform MSA in batches of 100 sequences and then concatenate the results somehow?

alignment MSA • 764 views

ADD COMMENT • link 2.7 years ago by Sergey • 0

1

Entering edit mode

Check out Cactus: https://www.nature.com/articles/s41586-020-2871-y

ADD REPLY • link 2.7 years ago by Pappu ★ 2.1k

0

Entering edit mode

Thanks a lot! That's what I needed!

ADD REPLY • link 2.7 years ago by Sergey • 0

Login before adding your answer.