Question

Muliple sequence alignment for sequences in more than 4000 files

1

Entering edit mode

8.4 years ago

utkarsh.sood ▴ 40

Hi,

I have more than 4000 files containing orthologous gene clusters in each file from a bacterial community. How can I perform multiple sequence alignment of all the files in fast and efficient manner? Should I use clustal W, if yes how?

Thanks!

sequence orthologs clustalW • 4.0k views

ADD COMMENT • link updated 8.4 years ago by Rob ▴ 150 • written 8.4 years ago by utkarsh.sood ▴ 40

0

Entering edit mode

As long as the individual files are not large you may be able to use any MSA program. Basically you are looking to do 4000 separate MSA's? A cluster would be the way to go for something this large.

This paper discusses acceleration of MSA's but requires special hardware which you may not have.

ADD REPLY • link 8.4 years ago by GenoMax 147k

0

Entering edit mode

I have the access to server with 24 cores and 128Gb RAM.Can I create multiple ClustalW alignments for thousands of fasta in a directory. The input would be: 1.fasta, 2.fasta... 6405.fasta; where a given file commonly contains14 or more proteins.

The output would be:1.aln, 2.aln... 6405.aln

ADD REPLY • link 8.4 years ago by utkarsh.sood ▴ 40

score 1 · Answer 1 · 2016-07-20

1

Entering edit mode

8.4 years ago

Medhat 9.8k

If your data is big you can use Kalign Very fast MSA tool that concentrates on local regions. Suitable for large alignments

Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods

ADD COMMENT • link 8.4 years ago by Medhat 9.8k

0

Entering edit mode

Thanks for your help! Do I have to upload 4000 files separately (as each file contain multiple sequence for 1 protein) ?

ADD REPLY • link 8.4 years ago by utkarsh.sood ▴ 40

0

Entering edit mode

If you need to align each file separately yes, If you want to align them all at the same time you can concat them all in one file. for example; you can run this command in the directory that contains all files

cat `ls *.fasta` > files_concat.fasta

also you can download it and use it locally

ADD REPLY • link 8.4 years ago by Medhat 9.8k

0

Entering edit mode

Web tool owners may not like you uploading 4000 files (if it is possible in the first place). As indicated by @medhat you should download and use the tool locally. You should be able to do based on your response to my post above (in terms of hardware).

ADD REPLY • link 8.4 years ago by GenoMax 147k

score 1 · Answer 2 · 2016-07-20

1

Entering edit mode

8.4 years ago

Rob ▴ 150

If you need really fast alignment, I suggest you to try mafft (http://mafft.cbrc.jp/alignment/software/), it has some really fast and accurate methods for that. But if your files are small you can use clustalw / clustal omega / muscle / t_coffee or whatever you want.

ADD COMMENT • link 8.4 years ago by Rob ▴ 150