Question

Continuous mismatch at the end of the alignment, how to eliminate it?

0

Entering edit mode

6.3 years ago

hjsunbio ▴ 10

Since I am a newer here, I only can make five post in six hours, so I can't answer all question immediately. I just post the answer below

Q:There may be a method available in biopython to edit the variable part out. What format are the alignment files in at this time?@genomax

A:The alignments are in fasta format
Q:The trailing unaligned regions are results of global alignments. Use mafft --localpair (local alignment), in which trailing unialign regions from both the ends are not considered in order to maximiza alignment scores.

A: mafft were set as mafft --localpair --maxiterate 1000 --adjustdirectionaccurately

I have collected thousands of protein alignments from three subspecies (each with three individuals), the data were based on de novo assembly of RNA-seq.

I used mafft (https://mafft.cbrc.jp/alignment/software/) for initial alignment and then Gblocks (http://molevol.cmima.csic.es/castresana/Gblocks.html) to extract conserved regions. My problems is some continuous mismatch at the end of the alignments see picture below.

I want to eliminate the red part (see the pictures ) at the end of the alignments of thousands of genes in batch, the start position is the first mismatch at the end of the alignments.

The alignments are in fasta format @genomax

Picture links

Parameters were set as following mafft --localpair --maxiterate 1000 --adjustdirectionaccurately

Gblocks xxx.fas -t=p -e=.fas -b2=9 -b3=3 -b4=50 -b5=n

alignment sequence RNA-Seq • 1.8k views

ADD COMMENT • link 6.3 years ago by hjsunbio ▴ 10

0

Entering edit mode

Please use the method shown here to insert images into your post: How to add images to a Biostars post so they are parsed in-line.

ADD REPLY • link 6.3 years ago by GenoMax 147k

0

Entering edit mode

Thank you for your suggestion.

ADD REPLY • link 6.3 years ago by hjsunbio ▴ 10

0

Entering edit mode

Why don't you remove the variable part after you create the consensus?

ADD REPLY • link 6.3 years ago by GenoMax 147k

0

Entering edit mode

I need to deal with more than 6000 genes, so manual editing would be much time-consuming.

ADD REPLY • link 6.3 years ago by hjsunbio ▴ 10

1

Entering edit mode

There may be a method available in biopython to edit the variable part out. What format are the alignment files in at this time?

ADD REPLY • link 6.3 years ago by GenoMax 147k

0

Entering edit mode

The alignments are now in fasta format. Sorry for delaying the reply, Since I am a newer here, I only can make five post in six hours, so I can't answer all question immediately. Could you help me with more details about using biopython or perl or awk to solve the problem.

ADD REPLY • link 6.3 years ago by hjsunbio ▴ 10

0

Entering edit mode

Your first figure suggests you have two different isoforms (isoform 1: e1, e2, x2; isoform 2: e3, h1, h2, h3, x1, x3), and you are aligning these different isoforms, hence the "mismatch" at the end.

ADD REPLY • link 6.3 years ago by h.mon 35k

0

Entering edit mode

Yes, they are probably different isoforms or assembly errors, what I want is to get the consensus part.

ADD REPLY • link 6.3 years ago by hjsunbio ▴ 10

score 1 · Answer 1 · 2018-08-07

1

Entering edit mode

6.3 years ago

Dattatray Mongad ▴ 380

The trailing unaligned regions are results of global alignments. Use mafft --localpair (local alignment), in which trailing unialign regions from both the ends are not considered in order to maximiza alignment scores.

ADD COMMENT • link 6.3 years ago by Dattatray Mongad ▴ 380

0

Entering edit mode

Thank you for your answere. I do have used localpair mode, but the trailing unaligned regions were still there. mafft set as below.

mafft --localpair --maxiterate 1000 --adjustdirectionaccurately

ADD REPLY • link 6.3 years ago by hjsunbio ▴ 10