Question

Add new sequence to clustalw2 alignment

1

Entering edit mode

7.3 years ago

DanielC ▴ 210

Dear Friends,

I generated a clustal alignment file of two fasta format sequences using

clustalw2 -infile=xx.fasta -align -type=DNA -output=CLUSTAL -outfile=xx.aln

There are only 2 sequences in the "xx.fasta" file. The fasta file contains ESR gene and an exon sequence ( exon data obtained from targeted gene sequencing). After the first step, the exon sequence got aligned to the ESR gene, now am trying to add the next exon sequence to the alignment file "xx.aln". Could you please let me know how to do this in clustalw2.

The goal is to align the exon sequences to the ESR gene and find SNPs/mutations. I would really appreciate your input/solutions.

Thanks, DK

clustalw new sequence • 3.2k views

ADD COMMENT • link 7.3 years ago by DanielC ▴ 210

0

Entering edit mode

If you want to produce a multiple alignment, just add all your sequences into the original fasta file from the beginning. Then again, this does not sound like a typical multiple alignment task. You could just do pairwise alignment of each exon sequence against the gene with e.g. emboss water. How many of those exon sequences do you have to align?

ADD REPLY • link 7.3 years ago by cschu181 ★ 2.8k

0

Entering edit mode

Thanks for your reply! I have 300 exon sequences to align. I am looking to generate one alignment of all exons and ESR gene, by adding new exon sequence to the previous generated pairwise alignment. Could you please tell me how to add new exon to a clustalw alignment file and generate a new alignment file? Or, if you know of other solution please let me know. Thanks.

ADD REPLY • link 7.3 years ago by DanielC ▴ 210

0

Entering edit mode

Do you want to manually add the new exon directly to the .aln file or do you want to get it aligned & added to the result file?

What would your desired output look like?

ADD REPLY • link 7.3 years ago by lieven.sterck 15k

0

Entering edit mode

I want to get the new exon aligned to the previously generated alignment file. The output should be like:

exon1:                       --------
exon2:                                           ---------
reference gene: -----------------------------------------------------------------------

Please let me know of your comments on how this can be achieved?

Thanks, DK

ADD REPLY • link 7.3 years ago by DanielC ▴ 210

0

Entering edit mode

can't you simply run a multiple alignment will all exons at once then?

ADD REPLY • link 7.3 years ago by lieven.sterck 15k

0

Entering edit mode

I tried, the issue is when I do that, for some reason, all exons get aligned to one region in the reference gene like this: (which I think is not right, since exon1 and exon2 cannot be at the same position in the gene)

exon1:                                          ---------
exon2:                                           ----------
reference gene: ------------------------------------------------------------------------

Your comments/solutions are very welcome. Thanks.

ADD REPLY • link 7.3 years ago by DanielC ▴ 210

1

Entering edit mode

Maybe align them with some short/long read aligner (bwa mem, minimap, bbmap, ...) and visualise the results in a genome browser (IGV). Concerning your exon sequences, if they align to the same position then they must be similar if not partially identical. According to OMIM, ESR contains only 8 exons, so are your exon sequences maybe only partial (which would explain the overlapping that you report in that figure above)? Or alternatively, is that reference gene sequence highly repetitive by chance? That would complicate things with the mapping.

ADD REPLY • link 7.3 years ago by cschu181 ★ 2.8k

0

Entering edit mode

Thanks for the response. I will try the bwa aligner as you suggested. Yes, the exons of ESR were partially sequenced, so not all regions of the 8 exons could be sequenced. The thing I noticed is when I do individual alignment to the reference ESr gene the exon align very well with >95% similarity but when I align them all together they all accumulate together in one region as I stated above. The reference gene ESR is taken from NCBI, I don't know if that is highly repetitive.

ADD REPLY • link 7.3 years ago by DanielC ▴ 210

0

Entering edit mode

Have you ensured you’re running a local alignment and not a global one?

ADD REPLY • link 7.3 years ago by Joe 22k

0

Entering edit mode

Thanks for the response. Could you please let me know how to run local alignment using clustalw or any other tool you could suggest? The command line I used to align is mentioned in my question. I need to run the tools standalone/locally.

ADD REPLY • link 7.3 years ago by DanielC ▴ 210

1

Entering edit mode

How To Compute A Local Multiple Protein Sequence Alignment?

ADD REPLY • link 7.3 years ago by Joe 22k

1

Entering edit mode

I think mafft can do the job. Add new sequences to an existing alignment using MAFFT

ADD REPLY • link 7.3 years ago by Sishuo Wang ▴ 230