Since I am a newer here, I only can make five post in six hours, so I can't answer all question immediately. I just post the answer below
Q:There may be a method available in biopython to edit the variable part out. What format are the alignment files in at this time?@genomax
A:The alignments are in fasta format
Q:The trailing unaligned regions are results of global alignments. Use mafft --localpair (local alignment), in which trailing unialign regions from both the ends are not considered in order to maximiza alignment scores.
A: mafft were set as
mafft --localpair --maxiterate 1000 --adjustdirectionaccurately
I have collected thousands of protein alignments from three subspecies (each with three individuals), the data were based on de novo assembly of RNA-seq.
I used mafft (https://mafft.cbrc.jp/alignment/software/) for initial alignment and then Gblocks (http://molevol.cmima.csic.es/castresana/Gblocks.html) to extract conserved regions. My problems is some continuous mismatch at the end of the alignments see picture below.
I want to eliminate the red part (see the pictures ) at the end of the alignments of thousands of genes in batch, the start position is the first mismatch at the end of the alignments.
The alignments are in fasta format @genomax
Parameters were set as following
mafft --localpair --maxiterate 1000 --adjustdirectionaccurately
Gblocks xxx.fas -t=p -e=.fas -b2=9 -b3=3 -b4=50 -b5=n
Please use the method shown here to insert images into your post: How to add images to a Biostars post so they are parsed in-line.
Thank you for your suggestion.
Why don't you remove the variable part after you create the consensus?
I need to deal with more than 6000 genes, so manual editing would be much time-consuming.
There may be a method available in biopython to edit the variable part out. What format are the alignment files in at this time?
The alignments are now in fasta format. Sorry for delaying the reply, Since I am a newer here, I only can make five post in six hours, so I can't answer all question immediately. Could you help me with more details about using biopython or perl or awk to solve the problem.
Your first figure suggests you have two different isoforms (isoform 1: e1, e2, x2; isoform 2: e3, h1, h2, h3, x1, x3), and you are aligning these different isoforms, hence the "mismatch" at the end.
Yes, they are probably different isoforms or assembly errors, what I want is to get the consensus part.