I have set of 120 sequence files. Now i need to make the alignment with wild type protein sequence for all other 120 sequence files and and then result file must come all together in a single file.
kindly please help me.
I have set of 120 sequence files. Now i need to make the alignment with wild type protein sequence for all other 120 sequence files and and then result file must come all together in a single file.
kindly please help me.
If you have 120 FASTA files with one sequence each, and another with your wild type (and you're using Linux/OS X), first use cat to concatenate all the sequences into one file, e.g.
cat seq1.fasta seq2.fasta ... seqN.fasta > all_my_sequences.fa
or
cat *.fasta > all_my_sequences.fa
Then go to the EBI Clustal Omega server and upload all_my_sequences.fa, or paste the contents of the file in the box. Change the output format to whatever you want (Clustal format is probably better for humans and Pearson/FASTA for computers), then just click submit.
I am afraid, this line
cat *.fa > all_my_sequences.fa
is dangerous. It's better to do something like
cat *.fa > all_my_sequences.txt
or
cat *.fa > all_my_sequences.fasta
And I like to use Mafft for the multiple alignment:
Can you elaborate on why it's dangerous? I guess you can only run that command once, is that what you mean?
I like MAFFT, but in this case I would probably want to use MAFFT L-INS-i or G-INS-i, rather than the default MAFFT parameters, and I just tried to give the simplest option I could think of (no software installation or changing parameters on the web server).
T-Coffee might also be a good option for this number of sequences.
I've had this as a mistake several times, cat will use all *.fa files, including the output-file, that is why output-file
extension should be different.
The full mafft comand is a long string with different parameters. It allows many iterations, this is useful sometimes.
It would look like:
mafft-7.215-with-extensions/bin/mafft --localpair --maxiterate 1000 --ep 0.123 --legacygappenalty initial_file.fasta > align.fa
I think it works as long as the file that you're writing to doesn't exist already, but you're right, it's sloppy--I updated my answer.
The long string of parameters is why I didn't recommend MAFFT for this question, I was just trying to keep it simple. It's a great option, though.
if your working on windows you can try the Bioedit tool
Did you try clustalW? http://www.clustal.org/clustal2/
For protein alignments they recommend Clustal Omega.
Jalview www.jalview.org.uk) is versatile free tool for MSA which can run all the main MSA algorithms. Look at their YouTube Jalview Online Training videos for more information. It also has integrated structure, annotation, PCA and tree windows.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You need to clarify if all sequence files in question are protein.
Sounds like you sequence files may be DNA. If so you will need to do some additional work (translate) before any of following packages (mentioned in various answers) can be used.