Manual edit of multiple alignment
2
1
Entering edit mode
8.5 years ago
san.san ▴ 190

I've been asked to align and manually edit some sequences for a phylogenetic analysis. I've aligned my data with ClustalW and MAFFT for comparison and spent several hours trying to figure out how to manually edit these alignments. I'm at a loss.

I thought I'd need trim the ends of the alignments to be read by MEGA, but my supervisor said I should just replace the gaps with Ns, and should look for misalignments. Since my supervisor is away for two weeks, I can't ask for any clarifications.

So, the questions are:

1) Is it a good practice to replace gaps with Ns and is there a certain way to do it? (Do I just replace all gaps with Ns, or only certain gaps?) 2) How do I identify misalignments? And do I delete the misaligned parts in one sequences or all? 3) If there is a good guide/workshop/tutorial on how to do manual editing, please refer me to it.

Many thanks.

alignment sequence • 10k views
ADD COMMENT
3
Entering edit mode

Take a look at Jalview. If the gaps were introduced by ClustaW to create the alignments how can you replace them with N's (which would signify an unknown nucleotide in that location where there is a gap now).

ADD REPLY
6
Entering edit mode
8.5 years ago

As already mentioned, use Jalview to manually edit your alignment.

1- Don't replace gaps with Ns or vice versa unless you know this is the right thing to do. They are not the same thing. An N indicates an undetermined amino-acid (or nucleotide) whereas a gap indicates an absence of sequence. For the purpose of phylogenetic analysis, a gap is treated as an informative state whereas N in considered uninformative or put another way, the gaps give information on the phylogeny whereas the Ns do not. The only reason I can think of to replace gaps with Ns is if you know that there's a bit of sequence missing at this position.

2- In general, there's no easy way to assess whether your tinkering with the alignment has improved it but you can use Jalview's color scheme to guide you. A good way to check alignments would be to use structural information e.g. check that matching domains are aligned to each other in a sensible way. There may also be cases of sequences that obviously don't fit in the alignment and in general these should be removed and may be realigned separately. Most of the editing you would need to do should be adding or removing gaps here and there.

EDIT: Corrected typos

ADD COMMENT
2
Entering edit mode
6.0 years ago
al-ash ▴ 210

Since this is one of the top hits when searching online for manual editing of multiple alignments, I'd like to reopen this topic to hopefully collect suggestions for some more tools than JalView for visual inspection and editing of multiple sequence alignments.

One common task is identification of misaligned section and e.g. its replacement with gaps/Ns.

MEGA: enables copy and paste of sequence regions but no direct tool to replace a sequence region with gaps.

UniPro UGENE: has under right mouse click an option "Edit" -> ""Fill selection with gaps" which sounds great but a bit surprisingly inserts a stretch of gaps of the same length as was the selected region and the rest of the sequence is in the alignment moved to the left (tested under MAC OS) so in fact is not useful for masking of portions of alignment.

JalView: has under right mouse click an option "Sequence" -> "Edit" -> "Edit sequence" which enables manual replacement of a region with gaps, however, the number of gaps has to be inserted manually to match the length of the edited sequence region.

ADD COMMENT
0
Entering edit mode

Dear al-ash If I use mega to do multiple alignment, and there are common gaps to all the sequences, Is it OK to delete the common gaps in order to construct a phylogenetic tree? Also, the ends of alignment for half of my sequences are filled with gaps, can I cut the ends (400 sites at the end, and 20 sites at the beginning)?

ADD REPLY

Login before adding your answer.

Traffic: 1405 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6