Entering edit mode
12 months ago
Ann
▴
40
An interesting article in the supplementary material contains an alignment of proteins that are important to me.
It looks like this:
>sp1_lox4
------------HVPRS--TE------YYT------------------------------
--------------------------------------TRN-------------P-----
-----------AY---SP----HVYSPPVT------------------SEPDRIRF-D-G
-SD--------IA--TSVGA-Y-P-T-ST----V----P--------------------S
>sp2_pb
----------------S---------T-L--------------PKRQRTA-FTNNQLLEL
EKEFHYNKYLCRSRRIEIAKALSLTERQ----------V-----------KIWFQNRRMK
YKKVN---TGF-E-SPD-------------------GM----------------------
-----------MK-----------------------------------------------
---------------------------------------PE-------------------
In addition to amino acids, there are only gaps "-" in the alignment.
If I just remove all the "-" gaps, will I get the original protein sequences that the authors aligned? Next, I want to use these prot sequences to annotate my own data.
If all that was done was a simple alignment. Then yes in theory just removing gaps will work. It's a weird alignment though, so it would be prudent to check the sequences afterwards via blast or something
I followed your advice, and everything worked out. Thank you very much!
Can you provide a link to the article? Sometimes, a "-" means no mutation compared to a reference sequence, and a letter (e.g. H, V, P, etc.) means a mutation, but it isn't clear to me what the authors did here. I don't think removing the gaps will give you the original sequence unless the authors induced a lot of deletions for some reason.
Thank you, here is the link
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-023-09826-z
I believe Joe is right - you can remove the gaps to get the sequence. However, I was unable to find sp1_lox4 or sp2_pb in the link that you provided. Using the sequences in your question, I removed the gaps for sp1_lox4 and BLASTed the resulting amino acid sequence, but nothing came up. When I did the same thing for sp2_pb, pb was the top hit.
This is due to the fact that I only included part of the sequences from the article, just for example. Thank you very much for your help with this issue