You may find this easier using the graphical version of ClustalW 2: ClustalX2 (see http://clustal.org/clustal2/), rather than the command-line ClustalW 2, FWIW the download for MS Windows is: clustalx-2.1-win.msi. After installation you will have a program menu item for ClustalX2, which you can run. You then load the sequences to be aligned ("File" -> "Load Sequences"), and perform the alignment ("Alignment" -> "Do Complete Alignment"). By default the guide tree (.dnd) and alignment (.aln) files will be generated in the same directory as the file of input sequences, but you can change this when prompted if you want them to go somewhere else.
If you have to use the command-line version of ClustalW 2, then you will need to know the path to your input sequences and since the ClustalW 2 executables are not automatically added to the PATH, where ClustalW 2 was installed. A typical command line session would look something like:
> cd "C:\Users\username\Documents\My Documents\Data"
> "c:\Program Files (x86)\ClustalW2\clustalw2.exe" /INFILE=arf_seq.faa /ALIGN
Assuming that the C:\Users\username\Documents\My Documents\Data
directory contains the input sequence data to be aligned, in this case the contents of the arf_seq.faa
file of fasta formatted sequences. By default the output files are generated in the directory containing the input data, and are named after the input file (in this case arf_seq.dnd
and arf_seq.aln
).
The same principle applies if you are using the interactive mode of ClustalW 2, except that you will need to know the complete path to the input sequence data file (e.g. C:\Users\username\Documents\My Documents\Data\arf_seq.faa
) in order to load the sequences. Again by default the output files will be generated in the directory containing the input file and will be named after the file. However this can be changed by using the prompts during the alignment process. Note that relative paths should be avoided if using the program menu item to start ClustalW 2 in interactive mode since these will be relative to the "Start in" directory specified in the shortcut used (typically the installation directory).
Please note that ClustalW 2 has largely be superseded by Clustal Omega, for most purposes the use of Clustal Omega is recommended.
For further assistance with the use of the Clustal series of multiple sequence alignment programs I suggest you contact the authors, see http://clustal.org/ for details.
@hpmcwill ... Thank you for your comprehensive reply , I truly appreciate it .
I already managed to fix this issue , it was something related to environment variables .
But unfortunately , I have to work with clustalW2 so I can maintain alignments from some biopython script :(
Thus ; I have one question left to ask : Why does clustalW2 choke when handling large sequence files (up to 300 Mb) ??
Such files tend to terminate the running session of clustalW2 !!
How can I overcome this problem ?!
For large inputs ClustalW 2 can require very large amounts of memory. Since the distributed ClustalW 2 binary for MS Windows is 32-bit it can only use up to 2GB of memory before being terminated. So I am guessing your problem is likely to be memory usage.
So you have a couple of options:
A. Recompile ClustalW 2 to support more memory.
According to "Memory Limits for Windows and Windows Server Releases" the 2GB limit can be increased to 4GB for a 32-bit process by linking with the
/LARGEADDRESSAWARE
flag enabled. Or you could try building a 64-bit binary.If you do not have a MS Windows compiler installed, you might want to look at Visual Studio Community 2013. If you have problems try contacting the authors (see http://clustal.org/).
B. Reduce the size and complexity of the input.
You might also want to consider migrating to Clustal Omega since BioPython does include support for the newer method (Bio.Align.Applications.ClustalOmegaCommandline), and Clustal Omega is much more memory efficient.
Again , thank you for your great detailed answer @hpmcwill.
I will try these options and see how it works, but I think I'm more likely to consider Clustal Omega since it's supported by biopython.
Thanks again, I'm so grateful for your help.
So far; I've tried the latter option and decided to upgrade to Clustal Omega but I've encountered many problems installing it.
First, its dependencies (argtable2) was quite complicated to install - at least to me - !!
The command line refuses to recognize the
nmake
so it can proceed with the argtable2 installation thing.I used this reference http://sourceforge.net/projects/argtable/files/, from there I explored the option of adding an environment variable to where nmake is located, but this didn't work.
I hesitated to install the visual studio - some online threads suggested this as a another solution - for two reasons: I do have visual studio 2010 installed and I'm worried about the risks that may threat already built projects on different programming languages platforms when upgrading an existing visual studio package to obtain the required compiler.
My question is: does upgrading or re-installing visual studio affects in anyway the previously created projects and their system configurations?
What are other possibilities I can explore to overcome this problem ?!
Second; how come the dependencies use the
nmake
"utility" while the main executable file Clustal Omega uses make?! Isn't (make) for Linux systems ?! That's really confusing -_- O.oI'm using Windows 7 64bit ... and I'm really disappointed to see Linux OS preferred somehow over Windows OS when it comes to bioinformatics tools :(
BTW ... why is that?!!!!
First off... have you tried using the MS Windows binary as suggested by the authors? Note that the "INSTALL.txt" file is the version from the UNIX distribution (I am guessing that this was meant to be replaced), and all you should need to do is unpack the distribution in an appropriate directory and use the
clustalo.exe
by either specifying the full path or adding the directory to the PATH.For help with building Clustal for MS Windows you will have to contact the Clustal authors as detailed in the documentation and on the Clustal website. As far as I can tell the source distributions do not contain support for MS Windows based builds, suggesting a separate build process was used by the authors to create the pre-compiled binaries provided on the website, since I have no way of knowing what this was you will have to contact them for details.
You might find using a UNIX style environment for MS Windows such as Cygwin, MinGW or Mingw-w64 makes things easier, and current versions of Cygwin and Mingw-w64 support a 64-bit tool chain so they may be an option for building from source to get a 64-bit binary.
As for the use of Linux in Bioinformatics, this is a product of a number of factors which have been discussed at length in various posts on Biostars and so is not really worth rehashing here. Suffice it to say that the use of open source and free tools (e.g. Perl, Python, GNU, etc.) and thus operating systems (e.g. Linux) is core to modern bioinformatics, and has been since this mid 1990s, and thus makes using MS Windows a more difficult option.
I have contacted the Clustal Omega authors , and they directed me to use this site as a useful reference.
I followed the instructions but it doesn't seem that Clustal Omega was configured correctly on my machine :(
I'm not familiar with Linux OS, but I noticed while reading the on screen Log that there was a problem recognizing the C compiler!!
Also , I noticed that no file named
libcc_sjlj-1.dll
was created ... instead I had this filelibgcc_s_sjlj-1.dll
!!I tried to run the binary (
clustalo.exe
) after finishing those steps , but a cmd screen flashes and disappears !!On the other hand , I tried to define an environment variable to where the 64bit clustalo binary exists and again I had no luck to run it , I'm really frustrated.
Am I missing something here?