Continuous mismatch at the end of the alignment, how to eliminate it?
2
0
Entering edit mode
6.4 years ago
hjsunbio ▴ 10

Since I am a newer here, I only can make five post in six hours, so I can't answer all question immediately. I just post the answer below

  1. Q:There may be a method available in biopython to edit the variable part out. What format are the alignment files in at this time?@genomax

    A:The alignments are in fasta format

  2. Q:The trailing unaligned regions are results of global alignments. Use mafft --localpair (local alignment), in which trailing unialign regions from both the ends are not considered in order to maximiza alignment scores.

    A: mafft were set as mafft --localpair --maxiterate 1000 --adjustdirectionaccurately

I have collected thousands of protein alignments from three subspecies (each with three individuals), the data were based on de novo assembly of RNA-seq.

I used mafft (https://mafft.cbrc.jp/alignment/software/) for initial alignment and then Gblocks (http://molevol.cmima.csic.es/castresana/Gblocks.html) to extract conserved regions. My problems is some continuous mismatch at the end of the alignments see picture below.

I want to eliminate the red part (see the pictures ) at the end of the alignments of thousands of genes in batch, the start position is the first mismatch at the end of the alignments.

The alignments are in fasta format @genomax

Picture links 1 2

Parameters were set as following mafft --localpair --maxiterate 1000 --adjustdirectionaccurately

Gblocks xxx.fas -t=p -e=.fas -b2=9 -b3=3 -b4=50 -b5=n

alignment sequence RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode

Please use the method shown here to insert images into your post: How to add images to a Biostars post so they are parsed in-line.

ADD REPLY
0
Entering edit mode

Thank you for your suggestion.

ADD REPLY
0
Entering edit mode

Why don't you remove the variable part after you create the consensus?

ADD REPLY
0
Entering edit mode

I need to deal with more than 6000 genes, so manual editing would be much time-consuming.

ADD REPLY
1
Entering edit mode

There may be a method available in biopython to edit the variable part out. What format are the alignment files in at this time?

ADD REPLY
0
Entering edit mode

The alignments are now in fasta format. Sorry for delaying the reply, Since I am a newer here, I only can make five post in six hours, so I can't answer all question immediately. Could you help me with more details about using biopython or perl or awk to solve the problem.

ADD REPLY
0
Entering edit mode

Your first figure suggests you have two different isoforms (isoform 1: e1, e2, x2; isoform 2: e3, h1, h2, h3, x1, x3), and you are aligning these different isoforms, hence the "mismatch" at the end.

ADD REPLY
0
Entering edit mode

Yes, they are probably different isoforms or assembly errors, what I want is to get the consensus part.

ADD REPLY
1
Entering edit mode
6.4 years ago

The trailing unaligned regions are results of global alignments. Use mafft --localpair (local alignment), in which trailing unialign regions from both the ends are not considered in order to maximiza alignment scores.

ADD COMMENT
0
Entering edit mode

Thank you for your answere. I do have used localpair mode, but the trailing unaligned regions were still there. mafft set as below.

mafft --localpair --maxiterate 1000 --adjustdirectionaccurately
ADD REPLY

Login before adding your answer.

Traffic: 2201 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6