Hi,
I have more than 4000 files containing orthologous gene clusters in each file from a bacterial community. How can I perform multiple sequence alignment of all the files in fast and efficient manner? Should I use clustal W, if yes how?
Thanks!
Hi,
I have more than 4000 files containing orthologous gene clusters in each file from a bacterial community. How can I perform multiple sequence alignment of all the files in fast and efficient manner? Should I use clustal W, if yes how?
Thanks!
If you need to align each file separately yes, If you want to align them all at the same time you can concat them all in one file. for example; you can run this command in the directory that contains all files
cat `ls *.fasta` > files_concat.fasta
also you can download it and use it locally
If you need really fast alignment, I suggest you to try mafft (http://mafft.cbrc.jp/alignment/software/), it has some really fast and accurate methods for that. But if your files are small you can use clustalw / clustal omega / muscle / t_coffee or whatever you want.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
As long as the individual files are not large you may be able to use any MSA program. Basically you are looking to do 4000 separate MSA's? A cluster would be the way to go for something this large.
This paper discusses acceleration of MSA's but requires special hardware which you may not have.
I have the access to server with 24 cores and 128Gb RAM.Can I create multiple ClustalW alignments for thousands of fasta in a directory. The input would be: 1.fasta, 2.fasta... 6405.fasta; where a given file commonly contains14 or more proteins.
The output would be:1.aln, 2.aln... 6405.aln