extract the original sequence from multiple sequence alignment (MSA)
0
1
Entering edit mode
12 months ago
Ann ▴ 40

An interesting article in the supplementary material contains an alignment of proteins that are important to me.

It looks like this:

>sp1_lox4
------------HVPRS--TE------YYT------------------------------
--------------------------------------TRN-------------P-----
-----------AY---SP----HVYSPPVT------------------SEPDRIRF-D-G
-SD--------IA--TSVGA-Y-P-T-ST----V----P--------------------S
>sp2_pb
----------------S---------T-L--------------PKRQRTA-FTNNQLLEL
EKEFHYNKYLCRSRRIEIAKALSLTERQ----------V-----------KIWFQNRRMK
YKKVN---TGF-E-SPD-------------------GM----------------------
-----------MK-----------------------------------------------
---------------------------------------PE-------------------

In addition to amino acids, there are only gaps "-" in the alignment.

If I just remove all the "-" gaps, will I get the original protein sequences that the authors aligned? Next, I want to use these prot sequences to annotate my own data.

MSA alignment • 1.1k views
ADD COMMENT
2
Entering edit mode

If all that was done was a simple alignment. Then yes in theory just removing gaps will work. It's a weird alignment though, so it would be prudent to check the sequences afterwards via blast or something

ADD REPLY
0
Entering edit mode

I followed your advice, and everything worked out. Thank you very much!

ADD REPLY
1
Entering edit mode

Can you provide a link to the article? Sometimes, a "-" means no mutation compared to a reference sequence, and a letter (e.g. H, V, P, etc.) means a mutation, but it isn't clear to me what the authors did here. I don't think removing the gaps will give you the original sequence unless the authors induced a lot of deletions for some reason.

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

I believe Joe is right - you can remove the gaps to get the sequence. However, I was unable to find sp1_lox4 or sp2_pb in the link that you provided. Using the sequences in your question, I removed the gaps for sp1_lox4 and BLASTed the resulting amino acid sequence, but nothing came up. When I did the same thing for sp2_pb, pb was the top hit.

ADD REPLY
1
Entering edit mode

This is due to the fact that I only included part of the sequences from the article, just for example. Thank you very much for your help with this issue

ADD REPLY

Login before adding your answer.

Traffic: 1896 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6