Dear Friends,
I generated a clustal alignment file of two fasta format sequences using
clustalw2 -infile=xx.fasta -align -type=DNA -output=CLUSTAL -outfile=xx.aln
There are only 2 sequences in the "xx.fasta" file. The fasta file contains ESR gene and an exon sequence ( exon data obtained from targeted gene sequencing). After the first step, the exon sequence got aligned to the ESR gene, now am trying to add the next exon sequence to the alignment file "xx.aln". Could you please let me know how to do this in clustalw2.
The goal is to align the exon sequences to the ESR gene and find SNPs/mutations. I would really appreciate your input/solutions.
Thanks, DK
If you want to produce a multiple alignment, just add all your sequences into the original fasta file from the beginning. Then again, this does not sound like a typical multiple alignment task. You could just do pairwise alignment of each exon sequence against the gene with e.g. emboss water. How many of those exon sequences do you have to align?
Thanks for your reply! I have 300 exon sequences to align. I am looking to generate one alignment of all exons and ESR gene, by adding new exon sequence to the previous generated pairwise alignment. Could you please tell me how to add new exon to a clustalw alignment file and generate a new alignment file? Or, if you know of other solution please let me know. Thanks.
Do you want to manually add the new exon directly to the .aln file or do you want to get it aligned & added to the result file?
What would your desired output look like?
I want to get the new exon aligned to the previously generated alignment file. The output should be like:
Please let me know of your comments on how this can be achieved?
Thanks, DK
can't you simply run a multiple alignment will all exons at once then?
I tried, the issue is when I do that, for some reason, all exons get aligned to one region in the reference gene like this: (which I think is not right, since exon1 and exon2 cannot be at the same position in the gene)
Your comments/solutions are very welcome. Thanks.
Maybe align them with some short/long read aligner (bwa mem, minimap, bbmap, ...) and visualise the results in a genome browser (IGV). Concerning your exon sequences, if they align to the same position then they must be similar if not partially identical. According to OMIM, ESR contains only 8 exons, so are your exon sequences maybe only partial (which would explain the overlapping that you report in that figure above)? Or alternatively, is that reference gene sequence highly repetitive by chance? That would complicate things with the mapping.
Thanks for the response. I will try the bwa aligner as you suggested. Yes, the exons of ESR were partially sequenced, so not all regions of the 8 exons could be sequenced. The thing I noticed is when I do individual alignment to the reference ESr gene the exon align very well with >95% similarity but when I align them all together they all accumulate together in one region as I stated above. The reference gene ESR is taken from NCBI, I don't know if that is highly repetitive.
Have you ensured you’re running a local alignment and not a global one?
Thanks for the response. Could you please let me know how to run local alignment using clustalw or any other tool you could suggest? The command line I used to align is mentioned in my question. I need to run the tools standalone/locally.
How To Compute A Local Multiple Protein Sequence Alignment?
I think mafft can do the job. Add new sequences to an existing alignment using MAFFT