Hi everyone so I have a question how can you print out for each specific residue of a sequence in msa its msa position I tried something with biopython :
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import AlignIO
alignment = AlignIO.read(open("ADORA1.pfam"), "PFAM")
def msa_position(aln, id, res_no):
rec = next((r for r in aln if r.id==id), None)
j = 0
for i, res in enumerate(rec.seq):
if res!='-':
if j==res_no:
return i
j+=1
print (msa_position(alignment, "ENST00000618295|ADORA1/1-210", 3))
However, i can only do it for one position at a time and i want output like
id Sequence_position msa_positon
ENST00000618295|ADORA1 3 119
ENST00000618295|ADORA1 4 120
and so on
Sorry for the code writing its my first time on biostars.
I have tried to format your code for you, but the indentation was not clear, so please correct me if I have it wrong.
I'm not quite sure I understand the question - you want to know, for a given sequence within an MSA, where its sequence starts (i.e. count the gap characters up til the first base/amino)?
So what i want is a table which will map every sequence position of each sequence in given alignment (there are three of them in the msa file) to its msa position of course counting the gaps until the first amino so basically for that sequence in the example the third position in the sequence is actually 119 position cause of the gaps
Can you provide some example input data?
Here you go something shorter than my sequences but principle should be the same so i want each position in each sequence mapped to its msa position example in Beta the 4th position in the sequence is 5th msa position and for the other two they are the same 4th position to 4th position.
alignment = MultipleSeqAlignment([ SeqRecord(Seq("ACTGCTAGCTAG", generic_dna), id="Alpha"), SeqRecord(Seq("ACT-CTAGCTAG", generic_dna), id="Beta"), SeqRecord(Seq("ACTGCTAGDTAG", generic_dna), id="Gamma"),
That gap is internal to the second sequence, so should the desired output be
0
or3
?so in sequence itself gaps you dont count so you go 1,2,3, skip gap 4 and for each of the counted positions i need the msa position so basically it will list in a column postions in sequence from 1 to length of that sequence and to each counted residued it will write a msa position as gaps arent counted the two numbers will not be the same